pawson 2005 realist review essay

Essay

Realist review – a new method of systematicreview designed for complex policy interventions

Ray Pawson, Trisha Greenhalgh1, Gill Harvey2, Kieran Walshe2

Department of Sociology and Social Policy, University of Leeds, Leeds; 1University College London, London; 2Manchester Business School,University of Manchester, Manchester, UK

Evidence-based policy is a dominant theme in contemporary public services but the practical realities andchallenges involved in using evidence in policy-making are formidable. Part of the problem is one of complex-ity. In health services and other public services, we are dealing with complex social interventions which acton complex social systems -- things like league tables, performance measures, regulation and inspection, orfunding reforms. These are not ‘magic bullets’ which will always hit their target, but programmes whosee¡ects are crucially dependent on context and implementation.Traditionalmethods of review focus onmea-suring and reporting on programme e¡ectiveness, often ¢nd that the evidence is mixed or con£icting, andprovide little or no clue as to why the interventionworked or did not work when applied in di¡erent contextsor circumstances, deployed by di¡erent stakeholders, or used for di¡erent purposes.

This paper o¡ers a model of research synthesis which is designed to work with complex social interven-tions or programmes, and which is based on the emerging ‘realist’ approach to evaluation. It provides anexplanatory analysis aimed at discerning what works for whom, in what circumstances, in what respectsand how.The ¢rst step is tomake explicit the programme theory (or theories) -- the underlying assumptionsabout how an intervention ismeant to work andwhat impacts it is expected to have.We then look for empiri-cal evidence to populate this theoretical framework, supporting, contradicting ormodifying the programmetheories as it goes.The results of the review combine theoretical understanding and empirical evidence, andfocus on explaining the relationship between the context in which the intervention is applied, the mechan-isms by which it works and the outcomes which are produced.The aim is to enable decision-makers to reacha deeper understanding of the intervention and how it can be made to work most e¡ectively.

Realist review does not provide simple answers to complex questions. It will not tell policy-makers ormanagers whether something works or not, but will provide the policy and practice community with thekind of rich, detailed and highly practical understanding of complex social interventions which is likely tobe of muchmore use to themwhen planning and implementing programmes at a national, regional or locallevel.

Journal of Health Services Research & Policy Vol 10 Suppl 1, 2005: 21–34 r The Royal Society of Medicine Press Ltd 2005

Introduction

Realist review is a relatively new strategy for synthesiz-ing research which has an explanatory rather thanjudgemental focus. It seeks to unpack the mechanismof how complex programmes work (or why they fail) inparticular contexts and settings. Realism is a methodo-logical orientation that has roots in philosophy1–4 andapplications in fields as diverse as sociology,5 psychol-ogy,6 economics7 and evaluation,8 but is as yet largely

untried as an approach to the synthesis of evidence inhealth care. Compared with clinical treatments, whichare conceptually simple and have generally beenevaluated in randomized controlled trials (RCTs), theliterature on health care management and policyinterventions is epistemologically complex and metho-dologically diverse, which we argue makes it highlysuited to realist review.

The quest to understanding what works in socialinterventions involves trying to establish causal rela-tionships. The hallmark of realist inquiry is itsdistinctive understanding of causality. The successionistmodel (which underpins clinical trials) holds thatcausality is established when the cause X is switchedon (experiment) and effect Y follows. In contrast,the generative model of causality (which underpinsrealist enquiry) holds that, to infer a causal outcome(O) between two events (X and Y), one needs toCorrespondence to: GH. Email: [email protected]

Ray Pawson PhD, Reader in Social Research Methodology, Depart-ment of Sociology and Social Policy, University of Leeds, Leeds, UK.Trisha Greenhalgh MD, Professor of Primary Health Care, Uni-versity College London, London, UK. Gill Harvey PhD, SeniorLecturer in Healthcare and Public Sector Management, KieranWalshe PhD, Director and Professor of Health Policy and Manage-ment, Manchester Business School, Centre for Public Policy andManagement, University of Manchester, Manchester, UK.

J Health Serv Res Policy Vol 10 Suppl 1 July 2005 S1:21

understand the underlying mechanism (M) that con-nects them and the context (C) in which the relation-ship occurs. So, for example, in order to evaluatewhether a training programme reduces unemployment(O), a realist would examine its underlying mechanismsM (e.g. have skills and motivation changed?) and itscontiguous contexts C (e.g. are there local skillshortages and employment opportunities?). Realistevaluation is thus all about hypothesizing and testingsuch CMO configurations. Under realism, the basicevaluative question – what works? – changes to ‘what isit about this programme that works for whom in whatcircumstances?’

This explanatory formula has been used bothprospectively (in formative evaluations) and concur-rently (in summative evaluations). In this paper, weshow how it may be operated retrospectively inresearch synthesis. The realist approach has noparticular preference for either quantitative or qualita-tive methods. Indeed, it sees merit in multiplemethods, marrying the quantitative and qualitative, sothat both the processes and impacts of interventionsmay be investigated. Throughout this paper, wesupport our arguments with reference to real policyissues that raise practical questions for the reviewer.Because realist review is especially appropriate forcomplex interventions, we have deliberately usedcomplex examples. In particular, we repeatedly drawupon a published review on the public disclosure ofperformance data.9

The nature of interventions

The starting point for research synthesis in health careis the nature of the interventions that will be examined.Intervention is a useful catch-all term, but it conflatesinitiatives that are methodologically quite separate.Thus, a clinical treatment is not the same as a health careprogramme, which is not to be confused with healthservice delivery, which is a different animal from healthpolicy. And so on. When reviewing research, the keytask is to match review method to subject matter. Weconsider seven defining features of complex serviceinterventions.

First, complex service interventions are theories.That is, they are always based on a hypothesis thatpostulates: if we deliver a programme in this way or wemanage services like so, then this will bring about someimproved outcome. Such conjectures are grounded onassumptions about what gives rise to poor perfor-mance, inappropriate behaviour and so on, and howchanges may be made to these patterns. Broadlyspeaking, therefore, we should expect reviews to pickup, track and evaluate the programme theories thatimplicitly or explicitly underlie families of interven-tions.

Second, such interventions are active – that is, theyachieve their effects via the active input of individuals(clinicians, educators, managers, patients). In rando-mized trials, human volition is seen as a contaminant.

The experimental propositions under test relate towhether the treatment (and the treatment alone) iseffective. As well as random allocation of participants,safeguards such as the use of placebos and doubleblinding are utilized to protect this causal inference.The idea is to remove any shred of human intention-ality from the investigation. Active programmes, bycontrast, only work through the stakeholders’ reason-ing and knowledge of that reasoning is integral tounderstanding its outcomes. Broadly speaking, weshould expect that, in tracking the successes andfailures of interventions, reviewers will find at leastpart of the explanation in terms of the reasoning andpersonal choices of different actors and participants.

Third, intervention theories have a long journey.They begin in the heads of policy architects, pass intothe hands of practitioners and managers, and (some-times) into the hearts and minds of patients. Differentgroups and relationships will be crucial to implementa-tion; sometimes the flow from strategists to frontlinemanagement will be the vital link; at other times theparticipation of the general public will be the keyinterchange. The success of an intervention thusdepends on the cumulative success of the entiresequence of these mechanisms as the programmeunfolds. Broadly speaking, then, we should expectreviews to explore the integrity of the implementationchain, examining which intermediate outputs need tobe in place for successful final outcomes to occur, andnoting and examining the flows, blockages and pointsof contention.

Let us introduce our main example: the policy ofpublic disclosure of information on performance(hospital star ratings, surgeon report cards and soon). There are several distinct stages and stakeholdersto work through for such an intervention to take effect.The first stage is problem identification, in which theperformance in question is measured, rated and(sometimes) ranked. The second is public disclosure, inwhich information on differential performance isdisclosed, published and disseminated. The third issanction instigation, in which the broader communityacts to boycott, censure, reproach or control the under-performing party. The fourth might be called miscreantresponse, in which failing parties are shamed, chastized,made contrite, and so improve performance in order tobe reintegrated.

The key point is that the different theories under-lying this series of events are all fallible. The intendedsequence above may misfire at any point, leading tounintended outcomes as depicted in Figure 1. Theinitial performance measure may amount to problemmisidentification if it is, for instance, not properly riskadjusted. Dissemination may amount to dissimulation ifthe data presented to the public are oversimplified orexaggerated. Wider public reactions may take the formof apathy or panic rather than reproach, and ratherthan being shamed into pulling up their socks, namedindividuals or institutions may attempt to resist, reject,ignore or actively discredit the official labelling.

Essay Realist review

S1:22 J Health Serv Res Policy Vol 10 Suppl 1 July 2005

The fourth feature of complex service interventionsis that their implementation chains are non-linear andcan even go into reverse. A top-down scheme canbecome bottom-up. For example, in hospital rating, astruggle may emerge between professional associationsand management about the validity or fairness of theindicators. The actual intervention takes shape accord-ing to the power of the respective parties. Broadlyspeaking, then, we should expect the review toexamine how the relative influence of different partiesis able to affect and direct implementation.

A fifth feature of interventions is that they are fragilecreatures, embedded in multiple social systems. Rarely,if ever, is a programme equally effective in allcircumstances because of the influence of context. Thesame sex education programme, for example, is likelyto unfold very differently in a convent girls’ school thanin a progressive mixed comprehensive school. A keyrequirement of realist inquiry is to take heed of thedifferent layers of social reality that make up andsurround interventions. The realist reviewer shouldexpect the same intervention to meet with both successand failure (and all points in between), when applied indifferent contexts. He or she must contextualize anydifferences found between primary studies in terms of(for example) policy timing, organizational culture andleadership, resource allocation, staffing levels andcapabilities, interpersonal relationships, and competinglocal priorities and influences (Figure 2).

A sixth feature of interventions is that they are leakyand prone to be borrowed. When it comes to puttingflesh on the bones of an intervention strategy, practi-tioners will consult with colleagues and cross-fertilizeideas. Indeed, such networking at middle managementlevel is strongly encouraged in modern health services,as in the high-profile quality improvement collabora-tives.10 Especially when it comes to ironing out snags,there will be a considerable amount of rubberneckingfrom scheme to scheme as stakeholders compare noteson solutions, and thereby change them. Reviewersmust always beware of the so-called ‘label naivete’.11

The intervention to be reviewed will carry a title andthat title will speak to a general and abstract pro-gramme theory that differs from the one practitionersand managers have implemented and empirical studieshave evaluated. Broadly speaking, then, we shouldexpect the same intervention to be delivered in amutating fashion shaped by refinement, reinventionand adaptation to local circumstances.

The final feature of complex service interventions isthat they are open systems that feed back onthemselves. As interventions are implemented, theychange the conditions that made them work in the firstplace. Learning occurs that alters subsequent receptiv-ity to interventions. Moreover, this evolution andadaptation may lead to unintended effects in the longerterm – as when criminals get wise to new crimeprevention devices and alter their behaviour accord-ingly, or when staff who favour particular managementinterventions preferentially apply to, or get retainedand promoted by, particular programmes.

Realist review: theoretical and practicallimitations

Complex service interventions, therefore, can beconceptualized as dynamic complex systems thrustamidst complex systems, relentlessly subject to negotia-tion, resistance, adaptation, leak and borrow, bloomand fade, and so on. Reviewing the effectiveness ofsuch systems-within-systems places three importanttheoretical limitations on the reviewer:

(a) A limit on how much territory he or she can cover: Anintervention may have multiple stages, each withits associated theory, and endless permutations ofindividual, interpersonal, institutional and infra-structural settings. The reviewer will need toprioritize the investigation of particular processesand theories in particular settings. In practicalterms, the review question must be carefullyarticulated so as to prioritize which aspects ofwhich interventions will be examined.

(b) A limit on the nature and quality of the information thathe or she can retrieve. Empirical research studies willprobably have focused on formal documentation(such as reports), tangible processes (such as theactivities of steering groups) and easily measuredoutcomes (such as attendance figures). Informal(and sometimes overtly ‘off the record’) informationrelating to interpersonal relationships and powerstruggles, and the subtle contextual conditions thatcan make interventions float or sink, will be muchharder to come by. In practical terms, the realistreviewer will need to draw judiciously upon a widerange of information from diverse primary sources.


Realist review Essay

Problemidentification

Publicdisclosure

Sanctioninstigation

Miscreantresponse

Problemmisconstruction

Messagedissimulation

Sanctionmiscarriage

Miscreantresistance

Figure 1 Programme theory for a policy of public disclosure of

performance data (with intended and unintended outcomes)

Institution

Interpersonal relations

Individuals

INTERVENTION Infrastructure

Figure 2 Intervention as the product of its context

(c) A limit on what he or she can expect to deliver in the wayof recommendations. Hard and fast truths about whatworks must be discarded in favour of contextualadvice in the general format: in circumstancessuch as A, try B, or when implementing C, watchout for D. Realist review delivers illuminationrather than generalizable truths and contextualfine-tuning rather than standardization.

The realist template for systematic review

The steps of realist review are summarized in Box 1.The science of systematic review has moved onconsiderably from the conventional Cochrane reviewformula: ask a focused question, search for evidence,appraise papers, extract data, synthesize evidence toproduce a ‘clinical bottom line’ and disseminatefindings.12 A number of more flexible and sophisti-cated approaches have been incorporated by Cochranemethods groups to deal with qualitative (or thecombination of qualitative and quantitative) evidence.13

Nevertheless, the conventional Cochrane headings,depicted broadly in Box 1, are a useful departurepoint from which to consider the steps of realist review.

Rather more substages are featured in realist modeand, undoubtedly, there is added complexity largelydue to the need to deconstruct interventions intocomponent theories.

Although the steps have been presented as sequential,they are actually overlapping and iterative. For exam-ple, the search stage influences question refinement andvice versa, and the quality appraisal exercise occurs inparallel with (and is integral to) the synthesis stage.Realist review is about refining theories and secondthoughts can (and should) occur at any stage as newevidence emerges from the literature or peer reviewraises questions about the emerging explanations.

Identifying the review question

All reviews commence with an exercise in conceptualsharpening, attempting to define and refine preciselythe question to be pursued in research synthesis. In asystematic review of drug therapy, the reviewer willdefine a precise target population (say, men with stage1 or 2 prostate cancer), the dosage, duration, mode ofadministration and so on of the drugs in question, andthe outcome measure(s) (progression of cancer, qualityof life). Potential ambiguities are identified and the

Box 1 Key steps in realist review

Step 1: Clarify scopea. Identify the review question

Nature and content of the interventionCircumstances or context for its usePolicy intentions or objectives

b. Refine the purpose of the reviewTheory integrity – does the intervention work as predicted?Theory adjudication – which theories fit best?Comparison – how does the intervention work in different settings, for different groups?Reality testing – how does the policy intent of the intervention translate into practice?

c. Articulate key theories to be exploredDraw up a ‘long list’ of relevant programme theories by exploratory searching (see Step 2)Group, categorize or synthesize theoriesDesign a theoretically based evaluative framework to be ‘populated’ with evidence

Step 2: Search for evidencea. Exploratory background search to ‘get a feel’ for the literatureb. Progressive focusing to identify key programme theories, refining inclusion criteria in the light of emerging datac. Purposive sampling to test a defined subset of these theories, with additional ‘snowball’ sampling to explore new hypotheses as they

emerged. Final search for additional studies when review near completion

Step 3: Appraise primary studies and extract dataa. Use judgement to supplement formal critical appraisal checklists, and consider ‘fitness for purpose’:

Relevance – does the research address the theory under test?Rigour – does the research support the conclusions drawn from it by the researchers or the reviewers

b. Develop ‘bespoke’ set of data extraction forms and notation devicesc. Extract different data from different studies to populate evaluative framework with evidence

Step 4: Synthesize evidence and draw conclusionsa. Synthesize data to achieve refinement of programme theory – that is, to determine what works for whom, how and under what

circumstancesb. Allow purpose of review (see Step 1b) to drive the synthesis processc. Use ‘contradictory’ evidence to generate insights about the influence of contextd. Present conclusions as a series of contextualized decision points of the general format ‘If A, then B’ or ‘In the case of C, D is unlikely to

work’.

Step 5: Disseminate, implement and evaluatea. Draft and test out recommendations and conclusions with key stakeholders, focusing especially on levers that can be pulled in here-

and-now policy contextsb. Work with practitioners and policy-makers to apply recommendations in particular contextsc. Evaluate in terms of extent to which programmes are adjusted to take account of contextual influences revealed by the review: the

‘same’ programme might be expanded in one setting, modified in another and abandoned in another



precise relationship to be investigated comes to bebetter defined. Decisions made at this stage are thenenshrined in a systematic review protocol.

The realist approach also starts with a sharpening ofthe question to be posed, but the task is very different,because of the different nature of the interventionsstudied (complex and multiply embedded rather thansimple and discrete) and the different purpose of thereview (explanation rather than final judgement).These differences bite enormously hard at stage oneof a realist review. Both reviewers and commissionersshould anticipate focusing the question to be a time-consuming and ongoing task, often continuing to thehalf-way mark and even beyond in a rapid review. Wehave previously referred to this stage in the synthesis ofcomplex evidence as ‘the swamp’, and advised thatacknowledging its uncertain and iterative nature iscritical to the success of the review process.14

One important aspect of conceptual ground clearingbetween commissioners and reviewers is to agree theexplanatory basis of the review – that is, what is it aboutthis kind of intervention that works, for whom, in whatcircumstances, in what respects and why? Rather thancommissioners merely handing over an unspecifiedbundle of such questions and rather than reviewerspicking up those sticks with which they feel mostcomfortable, both parties should (a) work together on a‘pre-review’ stage in which some of these particularswill be negotiated and clarified; and (b) periodicallyrevisit, and if necessary revise, the focus of the review asknowledge begins to emerge.

Refining the purpose of the review

There are several variations on the explanatory theme,each operating under the overarching principle ofconcentrating attention on a finite set of programmetheories that have clear policy import and offer thepotential for change. At least four different takes arepossible:

i) Reviewing for programme theory integrity

This strategy is proposed by theories of changeevaluations, which view complex programmes assequences of stepping stones or staging posts, eachintermediate output having to be achieved in order tosuccessfully reach the intended outcome. Such evalua-tions pursue programmes in real time, searching outflows and blockages in the sequence of theories.15,16 A‘theory integrity’ strategy may be adapted for researchsynthesis, with the aim of discovering what have beenthe typical weak points and stumbling blocks in thehistory of the intervention under review.

ii) Reviewing to adjudicate between rival

programme theories

Many interventions are unleashed in the face of someambiguity about how they will actually operate, and a

realist review can take on the job of uncoveringevidence to adjudicate between rival theories of howthey work – or, more commonly, of identifying whichpermutation of mechanisms is most successful. Thisstrategy of using evidence to adjudicate betweentheories is the hallmark of realist inquiry and, somewould say, of the scientific method itself.17

iii) Reviewing the same theory in comparative

settings

This strategy is the underlying rationale for realistevaluation – in which it is assumed that programmesonly work for certain participants in certain circum-stances.8 A review will uncover many studies of the‘same’ intervention in different settings, and realistreview can profitably take on the task of trying toidentify patterns of winners and losers. It will bedifficult to discern such success and failure across theentire complexity of an intervention. Accordingly, the‘for whom and in what circumstances’ exercise mightbe best conducted component by component, begin-ning in our example with a study of the conditions inwhich performance indicators find acceptance orresistance.

iv) Reviewing official expectations against actual

practice

This strategy is, of course, a special application of (ii). Ifyou think of policy-thought pieces as a likely source ofillumination on the potential theories driving anintervention, you know that friends and foes of theintervention are likely to highlight key differences inthe underlying process. Typically, there is also opposi-tion between policy-makers and practitioners on thebest way to mount interventions. As we all know, theseare common grounds for political friction, but alsosplendid sources of rival theories that may be put toempirical adjudication via a realist review. Pawson’sstudy of the effectiveness of Megan’s Law used thenotification and disclosure theories embodied in statelegislation as a benchmark against which to compare itsactual operation.18

Although it is essential to clarify at some stage whichof these approaches will drive the review, it may not bepossible to make a final decision until the review iswell underway. Certainly, we counsel strongly againstthe pre-publication of realist review ‘protocols’ inwhich both the review question and the purpose ofthe review must be set in stone before the review itselfcommences.

Articulating key theories to be explored

To set the stage for the review proper, it is essential toarticulate the body of working theories that lie behindthe intervention. The reviewer must temporarily adopta primary research rather than synthesis role andscavenge ideas from a number of sources to produce a



long list of key intervention theories from which thefinal short list will be drawn up.

An important initial strategy is discussion withcommissioners, policy-makers and other stakeholdersto tap into official conjecture and expert framing of theproblem. But, at some point, the reviewer must enterthe literature with the explicit purpose of searching itfor the theories, the hunches, the expectations, therationales and the rationalizations for why the inter-vention might work. As we have illustrated, interven-tions never run smoothly. They are subject tounforeseen consequence due to resistance, negotiation,adaptation, borrowing, feedback and, above all, con-text. The data to be collected relate not to the efficacyof intervention, but to the range of prevailing theoriesand explanations of how it was supposed to work – andwhy things went wrong.

We can demonstrate this idea using our example ofthe public disclosure of health care performance(Figure 3). Public disclosure consists of several acti-vities (and thus theories), beginning with the produc-tion of performance measures. Classifications aremade at the individual and institutional levels, andcover anything from report cards on individualsurgeons to hospital star ratings. Such classificationsare not neutral or natural; they are made for apurpose and that purpose is to identify clearly thedifference between good and poor performances.Performance covers a host of feats and a multitudeof sins, and so the classification has to decide whichconfiguration of indicators (such as patient turnover,mortality rates, satisfaction levels, waiting times,cleanliness measures) constitutes satisfactory levels ofaccomplishment.

The key theories illustrated in Figure 3 are asfollows:� Theory one is about the currency through which

creditworthiness (or blameworthiness) is assigned.Having a classification based on a particular aspectof measured performance suggests causal agency(you are good or poor because you have scored X).As a consequence, these are contentious decisionsand the choice of currency weighs heavily on thepolicy-maker. Even the choice of unit of analysis isproblematic. A breakdown by surgeon will assignresponsibility for care to the individual, whereas aclassification by hospital suggests that success orfailure lies with the institution.

� Theory two is about the impact of publicity: ‘Sunlightis the best of disinfectant: electric light the mostefficient policeman’.19 Getting the theory intopractice involves multiple choices about whatinformation is released, through what means, towhom. But data never speak for themselves(especially if they cover the performance of multi-ple units on multiple indicators). Some reports offerraw scores, some compress the data into simplerankings, some include explanations and justifica-tions, and some draw inferences about what isimplied in the records.9 Dissemination practicesalso vary widely from the passive (the report ispublished and therefore available) to the active(press conferences, media releases, etc.), and relyon further theories (see below).

� Theory three is about actions by recipients ofthe message and the impact of those actions. Becauseinformation is now in the public domain, awider group of stakeholders is encouraged and



Theory one:Classification

The quality of particular aspectsof health care can be monitored and measured to provide validand reliable rankings of comparative performance

Theory two:Disclosure

Information on the comparative performanceand the identity of the respective parties is disclosed and publicisedthrough public media

Theory three:Sanction

Members of the broaderhealth community act onthe disclosure in order toinfluence subsequentperformance of namedparties

Theory four:Response

Parties subject to the publicnotification measures willreact to the sanctions in order tomaintain position or improveperformance

Theory five:Ratings Resistance

The authority of theperformance measures canbe undermined by theagents of those measured claiming that the data areinvalid and unreliable

Theory three a, b, c, dAlternative sanctions

The sanction mounted onthe basis of differentialperformance operatethrough:a) ‘regulation’b) ‘consumer choice’c) ‘purchasing decisions’d) ‘shaming’

Theory six:Rival Framing

The ‘expert framing’assumed in theperformance measureis distorted throughthe application of themedia’s ‘dominantframes’

Theory seven:Measure manipulation

Response may be made tothe measurement ratherthan its consequences withattempts to outmanoeuvrethe monitoring apparatus

Figure 3 An initial ‘theory map’ of the public disclosure of health care information

empowered to have a say on whether the reportedperformance is adequate. The broader community isthus expected to act on the disclosure by way ofshaming or reproaching or boycotting or restrainingor further monitoring the failing parties (and takingconverse actions with high-flyers). Again, there aremultiple choices to be made: which members of thewider community are the intended recipients? Howare they presumed to marshal a sanction? A range ofcontending response theories is possible.9 One idea(3a in Figure 3) is that public release is used tosupport closer regulation of public services. Theperformance data provide an expert and dispassio-nate view of a problem, which signals those in needof greater supervision and/or replacement. A secondtheory (3b) is that disclosure stimulates consumerchoice. The informed purchaser of health care isable to pick and choose. Lack of demand for theirservices drives the subsequent improvement of poorperformers. Theory 3c is a variant of this supply anddemand logic, arguing that the key consumers arenot individuals but institutions (fundholders, mana-ged care organizations, primary care organizations).Theory 3d reckons that naming and shaming is theworking mechanism and the under-performersrespond to the jolt of negative publicity. All thesecompeting theories will need to be explored in thereview. Incidentally, the actual review, when we getthat far, might declare on ‘none-of-the-above’ anddiscover that theory 3e about the procrastination,indifference and apathy of the wider public is theone that tends to hold sway.

� Theory four concerns the actions of those on thereceiving end of the disclosure. The basic expecta-tion is that good performers will react to disclosureby seeking to maintain position and that miscreantswill seek reintegration.20 The precise nature of thelatter’s reaction depends, of course, on the nature ofthe sanction applied at Step 3. If they are in receiptof a decline in purchasing choice, it is assumed thatthey will attempt to improve their product; if theyare shamed it is assumed they will feel contrite; ifthey are subject to further regulation it is assumedthat they will take heed of the submissions that flowfrom tighter inspection and supervision.

Theories five to seven in Figure 3 arise from anothermajor phase in the initial theory mapping. The fourtheories discussed arose from the expert framing of theprogramme, as described in Figure 1. However,interventions rarely run to plan. It is highly likely,therefore, that in the initial trawl for the theoriesunderlying the intervention being studied, the re-searcher will encounter rival conjectures about how ascheme might succeed or fail. Three of these rivalconjectures are illustrated by theories five, six andseven on the preliminary theory map in Figure 3.

� Theory five is about resistance to public disclosure. Itrecognizes that, be they surgeons or hospitals, those

on the receiving end of public disclosure are likelyto challenge its application and authority. This canoccur in respect of the initial classification, as whenthe participants’ professional bodies challenge theperformance measures on the grounds that theyare not properly risk-adjusted or fail to measureadded value. The success or failure of suchresistance will reverberate through the remainderof the implementation chain confronting thereviewer with the difficult question of testing afurther theory on whether schemes with acceptedand adjusted performance measures are moreprone to meet with success.

� Theory six postulates how a process external to theintervention might impinge on its potential success.To go public is to let a cat out of the bag that is notentirely within the control of those compiling theperformance data. The measures are applied with aspecific problem and subsequent course of action inmind (the expert framing of the issue). Whetherthese sit easy with ‘media frames’ is a mootquestion.21 The presentational conventions forhandling such stories are likely to revolve aroundshame and failure and these, rather than theintended reintegration message, may be the onesthat get heard.

� Theory seven is another potential mechanism forresisting the intervention. This occurs when thewhole measurement apparatus is in place and thepublic is primed to apply sanctions. It consists ofgaming, discovering ways to outmanoeuvre themeasures. This may involve marshalling the troopsto optimal performance on the day of assessment orapplying effort to activities gauged at the expenseof those left unmonitored. As a result, the reviewermust try to estimate the extent to which anyreported changes under public disclosure are realor ersatz.

These seven theories, which will each requireseparate testing in the next stage of the review, arenot an exhaustive set of explanations. For instance, aprogramme that mounted rescue packages for failinghospitals following the collection and public disclosureof performance information (for example, the work ofthe NHS Modernisation Agency’s Performance Devel-opment Team22) would put in place a whole raft offurther procedures, whose underlying theories couldbe unpacked and pursued.

In general terms, it should be clear that the ideasunearthed in theory mapping will be many and varied.They might stretch from macro theories about healthinequalities to meso theories about organizationalcapacity to micro theories about employee motivation.They might stretch through time relating, for example,to the impact of long-term intervention fatigue. Thisabundance of ideas provides the final task for this stage,namely to decide upon which combinations and whichsubset of theories are going to feature on the shortlist. A simple but key principle is evident – that



comprehensive reviews are impossible and that the taskis to prioritize and agree on which programme theoriesare to be inspected.

Before moving on to the next stage, it is worthreflecting that the initial phase of theory stalking andsifting has utility in its own right. There is aresemblance here to the strategies of concept mappingin evaluation and the use of logic models in manage-ment.23 Many interventions are built via thumbnailsketches of programme pathways, such as in Figure 3.Identifying the full range of theories in a matureprogramme lays bare for managers and policy-makersthe multitude of decision points in an intervention andthe thinking that has gone into them.

It is also worth reiterating that there is no singleformula for cutting through this complexity andexpressing the hypotheses to be explored. Thereviewer’s work will sometimes consist of comparingthe intervention in different locations. At other times, itwill involve tracking it through its various phases, andat other times it will involve arbitrating between theviews of different stakeholders. This emphasizes againthat realist review is not a review technique, but areview logic.

Given the processes involved in clarifying the scopeof the review and the importance of this stage, userparticipation is essential. Above all others, this stage isnot a matter of abstract data extraction. Rather, itrequires active and ongoing dialogue with the peoplewho develop and deliver the interventions, since theyare the people who embody and enact the theories thatare to be identified, unpacked and tested.

Having identified an area of inquiry and rooted outand prioritized the key theories to review, the reviewproper can commence, beginning with the gathering ofempirical evidence, followed by the testing of keytheories against this evidence.

Searching for relevant evidence

Given that realist review seeks to explore and con-textualize the intervention in multiple social settings,the search stage is intricate, lengthy and closelyinterwoven with other stages. It is useful to think ofthe search as having four separate components, thoughthis implies a neatness and linearity not achieved in reallife:

1. A background search to get a feel for the literature –whatis there, what form it takes, where it seems to belocated, how much there is. This is almost the veryfirst thing the reviewer should do.

2. Progressive focusing to identify the programme theories –locating the administrative thinking, policy history,legislative background, key points of contention inrespect of the intervention, and so on. This hasbeen described in the previous section concernedwith clarifying the scope of the review.

3. A search for empirical evidence to test a subset of thesetheories – locating apposite evidence from a range of

primary studies using a variety of research strate-gies. This is in some senses the search proper, inwhich the reviewer has moved on from browsingand for which a formal audit trail should beprovided in the write-up.

4. A final search once the synthesis is almost complete – toseek out additional studies that might furtherrefine the programme theories that have formedthe focus of analysis.

The search stage of realist review differs in two keyrespects from that of a conventional systematic review.First, there is not a finite set of relevant papers whichcan be defined and then found – there are many morepotentially relevant sources of information than anyreview could practically cover and so some kind ofpurposive sampling strategy needs to be designed andfollowed. Second, excluding all but a tiny minority ofrelevant studies on the grounds of rigour would reducerather than increase the validity and generalizability ofreview findings, since (as explained later) differentprimary studies contribute different elements to therich picture that constitutes the overall synthesis ofevidence.

A realist review will use multiple search strategiesthat make deliberate use of purposive sampling –aiming to retrieve materials purposively to answerspecific questions or test particular theories. A decisionhas to be made not just about which studies are fit forpurpose in identifying, testing out or refining theprogramme theories, but also about when to stoplooking – when sufficient evidence has been assembledto satisfy the theoretical need or answer the question.This test of saturation, comparable to the notion oftheoretical saturation in qualitative research,24 can onlybe applied iteratively, by asking after each stage or cycleof searching whether the literature retrieved addsanything new to our understanding of the interventionand whether further searching is likely to add newknowledge. In practice, it is rare to find an over-abundance of useable primary studies and searching ismore often driven by the material available and by thelimits of time and funding.

Purposive approaches to searching do not have thekind of neat, predefined sampling frame achievablethrough probability sampling. For instance, the re-viewer might choose to venture across policy domainsin picking up useful ideas on a programme theory.Schools have undergone a similar programme of publicdisclosure of performance data and there is no reasonwhy that literature cannot reveal crucial accounts ofintervention theories or even useful comparative testsof certain of those theories. Purposive sampling is alsoiterative in that it may need to be repeated astheoretical understanding develops. An understanding,say, of the media’s influence in distorting publiclydisclosed information, may only develop relatively latein a review and researchers may be forced back to thedrawing board and the search engines to seek furtherstudies to help sort out an evolving proposition.



Searching in realist review is both iterative andinteractive (involving tracking back and forth from theliterature retrieved to the research questions andprogramme theories) and the search strategies andterms used are likely to evolve as understanding grows.Because useful studies in this respect will often makereference to companion pieces that have explored thesame ideas, searching makes as much use of snowbal-ling (pursuing references of references by hand or bymeans of citation-tracking databases) as it does ofconventional database searching using terms or key-words. In a recent systematic review conducted alongrealist lines, one of us found that 52% of the empiricalstudies referenced in the final report were identifiedthrough snowballing, compared with 35% throughdatabase searching and 6% through hand searching.25

As far as the mechanics of searching goes, realistreview uses index headings, key word searches, searchengines, databases and so forth in the same way asother systematic review methods. There are somedifferent points of emphasis, however.

(a) Because it deals with the inner workings ofinterventions, realist review is much more likelyto make use of grey literature rather than relyingsolely on articles in academic journals.

(b) Because it takes the underpinning mechanism ofaction rather than any particular topic area as a keyunit of analysis, a much wider breadth of empiricalstudies may be deemed relevant and these willsometimes be drawn from different bodies ofliterature. As mentioned above, studies on thepublic disclosure of performance data by schoolswill have important lessons for health careorganizations and vice versa. Hence, a tightrestriction on the databases to be searched isinappropriate.

(c) Because it looks beyond treatments and outcomes,the key words chosen to instigate a search aremore difficult to fix. As a rough approximation, interms of their ability to score useful ‘hits’, propernouns (such as ‘The Lancet’) outstrip commonnouns (such as ‘publications’), which in turn outdoabstract nouns (such as ‘publicity’). Theory build-ing utilizes these terms in the opposite proportion.Accordingly, if, say, you are trying to locatematerial on ‘shame’ as experienced under ‘pub-licity’, snowballing is likely to be many times morefruitful than putting specific words into a ‘Med-line’ or similar search.

Appraising the quality of evidence

Next in a traditional systematic review comes a stage inwhich the methodological quality of the primarystudies is appraised. If research evidence is to have itssay, then it should be free of bias, and should have beencarried out to the highest methodological standards.Accordingly, at some stage in the process of evidence

synthesis, a quality filter needs to be applied and flawedstudies rejected.

Realist review supports this principle, but takes adifferent position on how research quality is judged. Asystematic review of biomedical interventions is basedfirmly on the use of a hierarchy of evidence with theRCT sitting atop, and non-RCTs, before and afterstudies, descriptive case studies and (lowest of all)opinion pieces underneath. Realist review rejects ahierarchical approach because, as explained below,multiple methods are needed to illuminate the richerpicture.

The problem with RCTs in testing complex serviceinterventions is that because such interventions arealways conducted in the midst of (and are thereforeinfluenced by) other programmes, they are never alikein their different incarnations – a point made recentlyin relation to the false grail of standardization.26 Anyinstitution chosen as a match in a comparison will alsobe in the midst of a maelstrom of change, making aclean policy-on/policy-off comparison impossible. It isof course possible to perform RCTs on service deliveryinterventions, but such trials provide impoverishedinformation because the RCT design is explicitlyconstructed to wash out the vital explanatory ingre-dients. While process evaluations may be conductedalongside RCTs to enable more detailed explanations,the basic issue of standardizing interventions remains.

Realist review, in the spirit of scientific enquiry, seeksto explore complex areas of reality by tailoring itsmethods eclectically to its highly diverse subject matter.Much contemporary effort and thinking has gone intoproducing appraisal checklists for non-RCT research,such as the Cabinet Office’s framework for assessingqualitative research (which runs to 16 appraisal ques-tions and 68 potential indicators).27 But such checklistsare not the answer to the complex challenge of realistreview for three reasons. First, such synthesis calls notmerely upon conventional qualitative and quantitativeresearch designs, but also on impact evaluations,process evaluations, action research, documentaryanalysis, administrative records, surveys, legislativeanalysis, conceptual critique, personal testimony,thought pieces and so on, as well as an infinite numberof hybrids and adaptations of these.

Second, the study is rarely the appropriate unit ofanalysis. Very often, realist review will choose toconsider only one element of a primary study in orderto test a very specific hypothesis about the link betweencontext, mechanism and outcome. While an empiricalstudy must meet the minimum criteria of rigour andrelevance to be considered, the study as a whole doesnot get included or excluded on the basis of a singleaspect of quality. Finally, appraisal checklists designedfor non-RCT research acknowledge the critical impor-tance of judgement and discretion. For instance, achecklist for qualitative research might include aquestion on the clarity and coherence of the repor-tage.27 In such cases, the checklist does little more thanassign structure and credibility to what are subjective



judgements. There comes a point when cross-matchinghundreds of primary studies with dozens of appraisalchecklists, often drawing on more than one checklistper study, brings diminishing returns.

The realist solution is to cut directly to the judge-ment. As with the search for primary studies, it is usefulto think of quality appraisal as occurring by stages.

Relevance: As previously discussed, relevance inrealist review is not about whether the study covereda particular topic, but whether it addressed the theoryunder test.

Rigour: That is, whether a particular inference drawnby the original researcher has sufficient weight to makea methodologically credible contribution to the test of aparticular intervention theory.

In other words, both relevance and rigour are notabsolute criteria on which the study floats or sinks, butare dimensions of fitness for purpose for a particularsynthesis. Let us consider an example. If we weresearching for evidence on public disclosure of perfor-mance data, we might well wish to consider the extentto which such records play a part in patients’ decisionsabout whether to use a particular service, surgeon orhospital. Research on this might come in a variety offorms. We might find, for example, self-reported dataon how people use performance data as part of a widertelephone survey on public views. We might find aninvestigation testing out respondents’ understanding ofthe performance tables by asking them to explainparticular scores and ratios. We might find an attemptto track fluctuations in admissions and dischargesagainst the publication of the report and othercontiguous changes. We might find a quasi-experimentattempting to control the release of information tosome citizens and not others, and following up fordifferences in usage. Finally, we might find qualitativestudies of the perceptions that led to actual decisions toseek particular treatments. (See Marshall et al. for aprofile of actual studies on this matter.9)

All of these studies would be both illuminating andflawed. The limitations of one would often be met withinformation from another. The results of one mightwell be explained by the findings from another. Such amixed picture is routine in research synthesis andreveals clearly the perils of using a single hierarchy ofevidence. But neither does it require taking on boardall of the evidence uncritically. Good practice insynthesis would weigh up the relative contribution ofeach source and this might involve dismissing somesources. The point is that, in good synthesis, you wouldsee this reasoning made explicit. To synthesize is tomake sense of the different contributions. The analysis,for instance, would actually spell out the grounds forbeing cautious about A because of what we havelearned from B and what was indicated in C. Such achain of reasoning illustrates our final point in thissection and, indeed, the basic realist principle onquality assessment – namely, that the worth of studies isestablished in synthesis and not as a preliminary pre-qualification exercise. Further examination of quality

issues in systematic reviews from the realist perspectivemay be found elsewhere.28

Extracting the data

In a conventional systematic review of a simpleintervention, data extraction is generally achievedusing a form designed to pull out the same items ofinformation from each paper: number of participants,completeness of follow-up, mean effect size, and so on.In mediator and moderator systematic reviews, a widerrange of additional information is collected from theprimary studies about attributes of participants, set-tings and interventions. This information is cast in theform of variables, since a data matrix remains theintended product. Qualitative reviews sometimes makeuse of comprehensive and uniform data extractionsheets, but grid entries take the form of free text andusually consist of short verbal descriptions of keyfeatures of interventions and studies.

The realist reviewer may well make use of forms toassist the sifting, sorting and annotation of primarysource materials. But such aids do not take the form of asingle, standard list of questions. Rather, a menu ofbespoke forms may be developed and/or the reviewermay choose to complete different sections for differentsources. This is a consequence of the many-sidedhypothesis that a realist review might tackle and themultiple sources of evidence that might be taken intoaccount. Some primary sources may do no more thanidentify possible relevant concepts and theories; forthese, data extraction can be achieved by marking therelevant sentences with a highlighter pen. Even thoseempirical studies that are used in testing mode are likelyto have addressed just one part of the implementationchain and thus come in quite different shapes and sizes.

Realist review thus assimilates information more bynote-taking and annotation than by extracting data assuch. If you are in theory-tracking mode, documentsare scoured for ideas on how an intervention issupposed to work. These are highlighted, noted andgiven an approximate label. Further documents mayreveal neighbouring or rival ideas. These are mentallybracketed together until a final model is built of thepotential pathways of the intervention’s theories.Empirical studies are treated in a similar manner,being scrutinized for which programme idea theyaddress, what claims are made with respect to whichtheories and how the apposite evidence is marshalled.These are duly noted and revised and amended as thetesting strategy becomes clarified.

Two further features of the realist reading ofevidence are worth noting. The first is that, as withany mode of research synthesis, you end up with theinevitable piles of paper as you try to recall which studyspeaks to which process and where a particular studybelongs. Just as a conventional review will append a listof studies consulted and then give an indication ofwhich contributed to the statistical analysis, so tooshould a realist review trace the usage and non-usage



of primary materials, though the archaeology ofdecision-making is more complex. You are inspectingmultiple theories and specific studies may speak tonone, one, more, or all of them. Nevertheless, as themethod develops, you should expect to develop aninterpretive trail of the different ways in which studieshave been used or omitted.

The second point to note is that the steps involved in arealist review are not linear; studies are returned to timeand again and thus extraction occurs all the way downthe line. There always comes a rather ill-defined point inthe sifting and sorting of primary models where youchange from framework building to framework testingand from theory construction to theory refinement. Youexperience a shift from divergent to convergent thinkingas ideas begin to take shape and the theories under-pinning the intervention gain clarity. To some extent,this is true for any systematic review, even thoughclassical accounts of the Cochrane approach imply a fullyreproducible and thus essentially technical process.

Synthesizing the evidence

Realist review perceives the task of synthesis as one ofrefining theory. Decision-makers generally appreciatethat programmes operate through highly elaborateimplementation processes, passing through manyhands and unfolding over time. Realist review startswith a preliminary understanding of that process, whichit seeks to refine by bringing empirical evidence to thehighways and byways of the initial theory map. It thusbegins with theory and ends with – hopefully – morerefined theory. What is achieved in synthesis is a fine-tuning of the understanding of how the interventionworks. Synthesis refers to making progress in explana-tion. That explanatory quest is inevitably complex,gains may be sought on a number of fronts; so a reviewmay be directed at any or all of the following issues:

� WHAT is it about this kind of intervention thatworks, for WHOM, in what CIRCUMSTANCES, inwhat RESPECTS and WHY?

In the section on clarifying the scope of the review, weargued that the purpose of the review should be definedat the outset. The synthesis stage should keep thispurpose in mind and hence focus on questioningprogramme theory integrity, adjudicating between rivalprogramme theories, considering the same theory incomparative settings, or comparing official expectationswith actual practice. What all these approaches have incommon is a focus on the programme theory rather thanthe primary study as the unit of analysis and the need tointerrogate and refine the theory as synthesis progresses.

Framing recommendations and disseminat-ing findings

Systematic reviews generally finish with recom-mendations for dissemination and implementation.

Contemporary accounts have stressed that, for re-search to be properly utilized, this concluding stageshould go well beyond the submission of a final reportto commissioners.29,30 Systematic review juries nolonger tend to retire for several months and appearwith a verdict. Rather, commissioners of reviews areincreasingly involved in the production of the researchsynthesis (by working with reviewers to hone, refineand remove residual ambiguity from the researchquestion), and reviewers increasingly bring theirtechnical expertise closer to the policy question byensuring that their research takes account of thepractical needs of a range of stakeholders in theshaping of an intervention.

This healthy two-way dialogue is what Lomas hascalled linkage.31 Realist review raises the status oflinkage from a recommendation to a methodologicalrequirement. We have already argued that the tasks ofidentifying the review question and articulating keytheories to be explored cannot meaningfully occur inthe absence of input from practitioners and policy-makers, because it is their questions and their assump-tions about how the world works that form the focus ofanalysis. Furthermore, the findings of a realist reviewmust be expressed not as universal scientific truths,such as ‘family intervention for schizophrenia producesa relative risk of relapse of 0.72’,32 but in the cautiousand contextualized grammar of policy discourse.

What the recommendations describe are the mainseries of decisions through which an initiative hasproceeded. Empirical findings are put to use in alertingthe policy community to the caveats and considerationsthat should inform those decisions – for example:‘remember A’, ‘beware of B’, ‘take care of C’, ‘D canresult in both E and F’, ‘Gs and Hs are likely tointerpret I quite differently’, ‘if you try J make sure thatK, L and M have also been considered’, ‘N’s effecttends to be short lived’, ‘O really has quite differentcomponents –P, Q and R’, and ‘S works perfectly well inT but poorly for U’. The review will, inevitably, alsoreflect that ‘little is known about V, W, X, Y and Z’.Given such an objective, it is easy to see why the realistreviewer generally finds that linkage with the policy-making community when writing-up accelerates ratherthan interferes with this task.

The issue of linkage raises the question of when theliaison between reviewers and decision-makers shouldoccur. While the popular recommendation is, perhaps,that they should hold hands throughout the review,this is a prospect that is somewhat unrealistic. Thecollaboration is best located at the beginning of theprocess. In practice, this means the commissionercoming to the reviewer with a broad list of questionsabout an intervention. The reviewer questions thequestions and suggests further angles that haveresonated through the existing literature. Then thereis more negotiation and, eventually, an agreementabout which particular lines of inquiry to follow.

As well as this initial meeting of minds, realist reviewalso anticipates that the review itself will partly reorder



expectations about what is important. The realistperspective can hardly speculate on the likelihood ofunintended consequences of interventions withoutapplying the rule reflexively. This means that roomfor further rounds of negotiation must be left openabout whether an unforeseen chink in the implementa-tion chain deserves closer inspection. But, at severalpoints in between, there are long periods whenreviewers should be left to their own devices. Theyshould, for example, be able to apply their academicexpertise on matters such as the methodological rigourand relevance of the primary research materials.

The analysis and conclusions section of a realistreview is not a final judgement on what works or thesize of an effect. Rather, it takes the form of revisions tothe initial understanding of how an intervention wasthought to work. The progress made in a review is notone from ignorance to answer, but from some knowl-edge to some more knowledge. Accordingly, there isroom for debate about the precise scope of the policyimplications of a realist review. Policy-makers, ofcourse, are likely to want the review to concentrateon the policy levers that they are actually able to pull.Extraordinary care must be taken at the point wherefindings are transformed into recommendations andclose involvement is once again needed.

Finally, we reach dissemination and implementation.The ultimate intended outcome, as with any systematicreview, is that practitioners on the ground take note ofthe findings and implement them. However, whereaswith the former such changes might be measured interms of simple behaviour change in the direction ofparticular recommendations (for example, are clini-cians prescribing therapy X for condition Y), imple-mentation of the findings of a realist review is acomplex process involving many actors, multipleprocesses and multiple levels of analysis. Furthermore,implementation is not a question of everyone stoppingdoing A and starting to do B. It is about individuals,teams and organizations taking account of all thecomplex and inter-related elements of the programmetheory that have been exposed by the review andapplying these to their particular local contexts andimplementation chains.

Thus, if a realist review is effectively disseminatedand implemented, we might expect to see subtle shiftsin emphasis in a programme in one setting, expansionof that programme as it stands in another setting andcomplete abandonment of the same programme in athird setting, as informed judgements are made as tohow different elements of the programme match up towhat is now known about what works, for whom, how,and in what circumstances. Finally, and perhaps mostimportantly, we should expect the findings of a realistreview to influence the design of new programmes.

Strengths and limitations of the realistapproach

We have argued for the theoretical and practicalstrengths of realist review, notably that it has firm roots

in philosophy and the social sciences. Realist review isnot a method or formula, but a logic of enquiry that isinherently pluralist and flexible, embracing bothqualitative and quantitative, formative and summative,prospective and retrospective, and so on. It seeks not tojudge but to explain, and is driven by the question‘What works for whom in what circumstances and inwhat respects?’. Realist review learns from (rather thancontrols for) real-world phenomena such as diversity,change, idiosyncrasy, adaptation, cross-contaminationand programme failure. It engages stakeholderssystematically – as fallible experts whose insider under-standing needs to be documented, formalized andtested, and provides a principled steer from failed one-size-fits-all ways of responding to problems. By takingprogramme theory as its unit of analysis, realist reviewhas the potential to maximize learning across policy,disciplinary and organizational boundaries.

However, realist review has important shortcomingsthat limit its applications. For example, realist reviewcannot be used as a formulaic, protocol-drivenapproach. Realist review is more about principles thatguide than rules that regularize. A realist review (whosekey quality features are the judgements of the reviewerand the interpretative trail that demonstrates howparticular empirical studies led to these judgements) isnot standardizable or reproducible in the same sense asa conventional Cochrane review (in which key qualityfeatures are technical standardization and clarity ofpresentation). This is not to say that quality assurancewithin the review process is not an equally importantconsideration with realist methods. However, theprocesses of quality assurance are more dependent onexplicitness and reflexivity on the part of the person/sundertaking the review. Another consideration is thatrealist review leads, at best, to tentative recommenda-tions. It will never produce generalizable effect sizessince all its conclusions are contextual. Whether this is adrawback or a virtue depends, of course, on yourperspective. In terms of undertaking a review, therealist approach requires a high level of experience andtraining in both academic (critical appraisal of empiri-cal studies) and service (programme implementation)domains. A further potential limitation of realist reviewis its relative newness and the fact that only a limitednumber of reviews have been completed to date.18,33

On the plus side though, some of the other issuesraised here in terms of limitations and shortcomingsare things that can be explored as the methods aretested and the review techniques are refined andfurther developed.

The contribution of realist review topolicy-making

We have outlined an approach to reviewing andsynthesizing research evidence that we believe is wellsuited to the study of complex social interventions. Butwhat function can realist review perform in the policyarena? The school of theory-based evaluation, of which



realist evaluation is a member, has always described itsappointed task as offering enlightenment as opposed totechnical or partisan support.34,35 Enlightenment’spositive prospect, for which there is a great deal ofempirical evidence,36–38 is that the influence of researchon policy occurs through the medium of ideas ratherthan of data. This is described by Weiss as ‘knowledgecreep’ to illustrate the way research actually makes itinto the decision-maker’s brain.34 Research is unlikelyto produce the facts that change the course of policy-making. Rather, policies are born out of clash andcompromise of ideas, and the key to enlightenment isto insinuate research results into this reckoning.39

On this score, realist review has considerableadvantages. Policy-makers may struggle with recom-mendations expressed in terms of the respectivestatistical significance of an array of mediators andmoderators. They are more likely to be able tointerpret and to utilize an explanation of why aprogramme worked better in one context than another.Note that these two research strategies are serving toanswer rather similar questions – the crucial pointbeing that the one that focuses on sense-making has theadvantage. This is especially so if the review hasexplicitly addressed the task of checking out rivalexplanations, because it can provide justification fortaking one course of action rather than another (i.e.politics).

Perhaps the best metaphor for the end product is toimagine the research process as producing a sort ofhighway code to programme building, alerting policy-makers to the problems that they might expect toconfront and some of the safest (i.e. best-tried and withwidest applications) measures to deal with these issues.The highway code does not tell you how to drive, buthow to survive the journey by flagging situations wheredanger may be lurking and extra vigilance is needed.

In some ways, realist review demands a lot ofdecision-makers. But realist review is fundamentallypragmatic and much can be achieved through the drip,drip, drip of enlightenment. In the days beforeevidence-based policy, we had policy from the seat-of-the-pants of experience. Reasoning went somethinglike this: we are faced with implementing this newscheme A, but it’s rather like the B one we tried at C,and you may recall that it hit problems in terms of Dand E, so we need to watch out for that again, etc. etc.’Not only is realist review equipped to uphold andinform this kind of reasoning (if you like, to give it anevidence base), it is also well suited to tapping into thekind of informal knowledge sharing that is beingencouraged through such schemes as the ‘Break-through’ quality improvement collaboratives that arepart of the NHS Modernisation Programme and whichexplicitly seek to transfer the sticky knowledge thatmakes for success in complex organizational innova-tions by bringing policy-makers and practitionerstogether in informal space.40 Realist review supple-ments this approach to organizational learning bythinking through the configurations of contexts and

mechanisms that need to be attended to in fine-tuninga programme. We believe that the stage is set fordeveloping explicit linkages between realist reviewersand contemporary initiatives in organizational devel-opment in the health services.

References

1 Bhaskar R. A Realist Theory of Science 2nd edn. Brighton:Harvester Press, 1978

2 Harre R. Social Being. Oxford: Blackwell, 19783 Putnam H. Realism with a Human Face. Cambridge: Harvard

University Press, 19904 Collier A. Critical Realism. London: Verso, 19945 Layder D. Sociological Practice. London: Sage, 19986 Greenwood J. Realism, Identity and Emotion. London: Sage,

19947 Lawson T. Economics and Reality. London: Routledge, 19978 Pawson R, Tilley N. Realistic Evaluation. London: Sage, 19979 Marshall M, Shekelle EP, Brook R, Leatherman S. Dying to

Know: Public Release of Information about Quality of HealthCare. London: Nuffield Trust, 2000

10 Øvretveit J, Bate P, Cretin S, et al. Quality collaboratives:lessons from research. Qual Saf Health Care 2002;11:345–51

11 Øvretveit J, Gustafson D. Evaluation of quality improve-ment programmes. Qual Saf Health Care 2002;11:270–5

12 Cochrane Reviewers’ Handbook 4.2.0. The Cochrane LibraryMarch 2004

13 Mays N, Pope C, Popay J. Systematically reviewingqualitative and quantitative evidence to inform manage-

ment and policy making in the health field. J Health Serv ResPolicy 2005;10(Suppl 1):6–20

14 Greenhalgh T. Meta-narrative mapping: a new approach tothe synthesis of complex evidence. In: Hurwitz B, Green-

halgh T, Skultans V, eds. Narrative Research in Health andIllness. London: BMJ Publications, 2004

15 Connell J, Kubish A, Schorr L, Weiss C. New Approaches toEvaluating Community Initiatives. Washington: Aspen Insti-

tute, 199516 Weiss C. Which links in which theories shall we evaluate?

In: Rogers P, Hasci I, Petrosino A, Hubner T, eds. ProgramTheory in Evaluation: Challenges and Opportunities. San

Francisco: Jossey Bass, 200017 Pawson R. A Measure for Measures: A Manifesto for Empirical

Sociology. London: Routledge, 198918 Pawson R. Does Megan’s Law Work. A Theory-Driven Systematic

Review Working Paper No. 8 ESRC UK Centre for

Evidence-Based Policy and Practice. London: Queen Mary,

University of London, 200219 Fisse B, Braithwaite J. The Impact of Publicity on Corporate

Offenders. Albany: State University of New York Press, 198320 Braithwaite J. Crime, shame and reintegration. Cambridge:

Cambridge University Press, 198921 Wolfsfeld G. Media and Political Conflict. Cambridge: Cam-

bridge University Press, 199722 NHS Modernisation Agency Clinical Governance Support

Team. Learning Through Supporting Improvements: Changefrom Challenge? Leicester: NHS Modernisation Agency, 2004

23 Knox C. Concept mapping in policy evaluation: a researchreview of community relations in Northern Ireland.

Evaluation 1995;1:65–7924 Glaser B, Strauss B. Discovery of Grounded Theory. Chicago:

Aldine, 196725 Greenhalgh T, Robert G, Bate P, Kyriakidou O, Macfarlane

F, Peacock R. How to Spread Ideas: A Systematic Review of theLiterature on Diffusion, Dissemination and Sustainability ofInnovations in Health Service Delivery and Organisation.

London: BMJ Publications, 2004



26 Hawe P, Shiell A, Riley T. Complex interventions: how ‘outof control’ can a randomised controlled trial be? BMJ2004;328:1561–3

27 Spencer L, Ritchie J, Lewis J, Dillon N. Quality in QualitativeEvaluation: A Framework for Assessing Research Evidence.London: Cabinet Office, Strategy Unit, 2003

28 Pawson R. Assessing the Quality of Evidence in Evidence-BasedPolicy: Why, How and When Working Paper No. 1. ESRCResearch Methods Programme. Manchester: University ofManchester, 2003, Available at: www.ccsr.ac.uk/methods

29 Nutley SM, Davies HTO. Making a reality of evidence-based practice: some lessons from the diffusion of innova-tions. Public Money Manage 2001;20:35–42

30 Walter I, Davies H, Nutley S. Increasing research impactthrough partnerships: evidence from outside health care.J Health Serv Res Policy 2003;8(Suppl 2):58–61

31 Lomas J. Using ‘linkage and exchange’ to move researchinto policy at a Canadian foundation. Health Affairs2000;19:236–40

32 Pharoah FM, Rathbone J, Mari JJ, Streiner D. FamilyIntervention for Schizophrenia. Cochrane Database of SystematicReviews. Oxford: Update Software, 2003

33 Pawson R. Mentoring Relationships: An Explanatory ReviewWorking Paper No. 21. ESRC Centre for Evidence-Based

Policy and Practice. London: Queen Mary, University ofLondon, 2004

34 Weiss CH. The circuitry of enlightenment. Knowl CreationDiffus Util 1987;8:274–81

35 Weiss CH, Bucuvalas MJ. Social Science Research and Decision-Making. Newbury Park: Sage, 1980

36 Deshpande R. Action and enlightenment functions ofresearch. Knowl Creation Diffus Util 1981;2:134–45

37 Lavis JN, Ross SE, Hurley JE, et al. Examining the role ofhealth services research in public policymaking. Milbank Q2002;80:125–54

38 Dobrow MJ, Goel V, Upshur RE. Evidence-based healthpolicy: context and utilisation. Social Sci Med 2004;58:207–17

39 Exworthy M, Berney L, Powell M. How great expectationsin Westminster may be dashed locally: the local implemen-tation of national policy on health inequalities. Policy Polit2003;30:79–96

40 Bate SP, Robert G. Report on the ‘Breakthrough’ CollaborativeApproach to Quality and Service Improvement in Four Regions ofthe NHS. A Research Based Evaluation of the OrthopaedicServices Collaborative Within the Eastern, South and West, SouthEast and Trent Regions. Birmingham: Health ServicesManagement Centre, University of Birmingham, 2002



pawson 2005 realist review essay

Documents

health service delivery

hospital star ratings

articulating key theories

initial theory map

conventional systematic review

complex social interventions

complex service interventions

key quality features