indirect treatment comparison / network … reporting quality & transparency, interpretation of...

ISPOR Task Force Report: ITC & NMA Study Questionnaire DRAFT FOR REVIEW ONLY

1

INDIRECT TREATMENT COMPARISON / NETWORK META-ANALYSIS

STUDY QUESTIONNAIRE TO ASSESS RELEVANCE AND

CREDIBILITY TO INFORM HEALTHCARE DECISION-MAKING: AN

ISPOR-AMCP-NPC GOOD PRACTICE TASK FORCE REPORT

DRAFT REPORT FOR REVIEW

VERSION SEPTEMBER 24, 2013

©2013 International Society for Pharmacoeconomics and Outcomes Research. All rights reserved under

International and Pan-American Copyright Conventions.


2

ABSTRACT

Evidence-based health-care decision making requires comparisons of efficacy and safety of all

relevant competing interventions for a certain disease state. When the available randomized

controlled trials of interest do not all compare the same interventions – but each trial compares

only a subset of the interventions of interest – it is possible to represent the evidence base as a

network where all trials have at least one intervention in common with another. Such a network

of trials involving treatments compared directly or indirectly can be synthesized by means of a

network meta-analysis. Despite the great realized or potential value of network meta-analysis to

inform healthcare decision-making, many decision-makers might not be very familiar with these

techniques and the use of network meta-analysis in health care decisions is still lagging. The

Task Force developed a consensus-based questionnaire to help decision makers assess the

relevance and credibility of indirect treatment comparisons and network meta-analysis to help

inform healthcare decision-making. The developed questionnaire consists of 26 questions. The

domain of relevance (4 questions) addresses the extent to which the results of the network meta-

analysis, if accurate, applies to the setting of interest to the decision-maker. The credibility

domains address the extent to which the network meta-analysis accurately answers the question

it is designed to answer. The 22 credibility questions relate to the used evidence base, analysis

methods, reporting quality & transparency, interpretation of findings, and conflict of interest.

The questionnaire aims to provide a guide for assessing the degree of confidence that should be

placed in a network meta-analysis and to enable decision-makers to gain awareness of the

subtleties involved in evaluating networks of evidence from these kinds of studies. It is

anticipated that user feedback will permit periodic evaluation and modification of the

questionnaire. The goal is to make the questionnaires as useful as possible to the healthcare

decision- making community.

Keywords: network meta-analysis, indirect comparisons, mixed treatment comparisons, validity,

bias, relevance, credibility, questionnaire, checklist, decision-making


3

BACKGROUND TO THE TASK FORCE

[To be added]

INTRODUCTION 1

Four Good Practices Task Forces developed a consensus-based set of questionnaires to help 2

decision makers evaluate 1) prospective and 2) retrospective observational studies, 3) network 3

meta-analysis (Indirect Treatment Comparison), and 4) decision analytic modeling studies with 4

greater uniformity and transparency [1-3]. The primary audiences of these questionnaires are 5

assessors and reviewers of health care research studies for health technology assessment, drug 6

formulary, and health care services decisions requiring varying level of knowledge and expertise. 7

This report focuses on the questionnaire to assess the relevance and credibility of network meta-8

analysis (indirect treatment comparison). 9

Systematic reviews of randomized controlled trials (RCTs) are considered the gold-standard for 10

evidence-based health care decision-making for clinical treatment guidelines, formulary 11

management, and reimbursement policies. Many systematic reviews use meta-analysis of RCTs 12

to combine quantitative results of several similar studies and summarize the available evidence 13

[4]. Sound comprehensive decision-making requires comparisons of all relevant competing 14

interventions. Ideally, robustly and comprehensively designed RCTs would simultaneously 15

compare all interventions of interest. Such studies are almost never available, thereby 16

complicating decision-making [5-8]. New drugs are often compared with placebo or standard 17

care, but not against each other, in trials aimed to contribute toward obtaining approval for drug 18

licensing; there may be no commercial incentive to compare the new treatment with an active 19

control treatment [8,9]. Even if there was an incentive to incorporate competing interventions in 20

a RCT, the interventions of interest may vary by country or have changed over time due to new 21

evidence and treatment insights. Therefore, for some indications, the relatively large number of 22

competing interventions makes a trial incorporating all of them impractical. 23

In the absence of trials involving a direct comparison of treatments of interest, an indirect 24

comparison can provide useful evidence for the difference in treatment effects between 25


4

competing interventions (which otherwise would be lacking) and for judiciously selecting the 26

best choice(s) of treatment. For example, if two particular treatments have never been compared 27

against each other, head-to-head, but these two treatments have been compared against a 28

common comparator, then an indirect treatment comparison can use the relative effects of the 29

two treatments versus the common comparator [10-15]. 30

Although it is often argued that indirect comparisons are only needed when direct comparisons 31

are not available, it is important to realize that both direct and indirect evidence contribute to the 32

total body of evidence. The results from indirect evidence in combination with the direct 33

evidence, may strengthen the assessment between treatments directly evaluated [6]. Even when 34

the results of the direct evidence are conclusive, combining them with the results of indirect 35

estimates in a mixed treatment comparison (MTC) may yield a more refined and precise estimate 36

for the interventions directly compared and broaden inference to the population sampled [12]. 37

When the available RCTs of interest do not all compare the same interventions – but each trial 38

compares only a subset of the interventions of interest – it is possible to represent the evidence 39

base as a network where all trials have at least one intervention in common with another trial. 40

Such a network of trials involving treatments compared directly, indirectly, or both, can be 41

synthesized by means of network meta-analysis [14]. In traditional meta-analysis all included 42

studies compare the same intervention with the same comparator. Network meta-analysis extends 43

this concept by including multiple pair-wise comparisons across a range of interventions and 44

provides (indirect) estimates of relative treatment effect on multiple treatment comparisons for 45

comparative or relative effectiveness purposes. In this report, we use the terms network meta-46

analysis and indirect comparison interchangeably.. 47

Despite the great realized or potential value of network meta-analysis to inform healthcare 48

decision-making and its increasing acceptance [e.g., Pharmaceutical Benefits Advisory 49

Committee (PBAC) in Australia, Canadian Agency for Drugs and Technologies in Health 50

(CADTH), National Institute for Health and Clinical Excellence (NICE) in the United Kingdom, 51

Institute for Quality and Efficiency in Health Care (IQWIG) in Germany, and Haute Autorité de 52

Santé (HAS) in France], many decision-makers might not be very familiar with these techniques 53

and the use of network meta-analysis in health care decisions is still lagging. A major dilemma 54


5

decision-makers are facing is the degree of confidence they should place in the results obtained 55

with a network meta-analysis. There is a critical need for education on network meta-analysis, as 56

well as transparent and uniform ways to assess the quality of reported network meta-analysis to 57

help inform decision-making. 58

In creating a questionnaire for health care decision makers to assess network meta-analyses, the 59

Task Force was charged with achieving two goals. One goal was to provide a guide for assessing 60

the degree of confidence that should be placed in such a study. The aim was to create a 61

questionnaire that would be relatively easy and quick to use for individuals without an in-depth 62

knowledge of design and statistical issues. The second goal was for the questionnaire to have 63

educational and instructional value to decision-makers and other users, enabling them to gain 64

awareness of the subtleties involved in evaluating network meta-analysis; we sought an 65

instrument that could prompt or invite users to obtain additional education on the underlying 66

methodologies. 67

68

QUESTIONNAIRE DEVELOPMENT 69

One issue in creating questionnaires for decision makers is whether they should be linked to 70

scorecards, annotated scorecards, or checklists. Concerns were raised by the Task Force that a 71

scorecard with an accompanying scoring system may be misleading; it did not have adequate 72

validity and measurement properties. Scoring systems may also provide users with a false sense 73

of precision and have been shown to be problematic in the interpretation of randomized trials 74

[16]. 75

An alternative to a scorecard is a checklist. However, it was felt by the Task Force that checklists 76

might mislead users in their assessments because a study may satisfy all of the elements of a 77

checklist and still harbor “fatal flaws” in the methods applied in the publication. Moreover, 78

users might have the tendency to count up the number of elements present converting it into an 79

implicit score, and then apply that implicit scoring to their overall assessment of the evidence. 80

In addition, the applicability of a study may depend on whether there is any other evidence that 81

addresses the specific issue or the decision being made. In general, an evidence evaluator needs 82


6

to be aware of the strengths and weaknesses of each piece of evidence and apply his or her own 83

reasoning. Indeed, it was felt that, in addition to the potential for implicit or explicit scoring to 84

be misleading, it would also undermine the educational goal that the questionnaires provide. 85

The Task Force decided to develop a questionnaire characterized by two principal concepts: 86

relevance and credibility. Relevance addresses the extent to which the results of the network 87

meta-analysis, if trustworthy, apply to the setting of interest to the decision-maker. The domain 88

of relevance addresses questions related to the population, comparators, endpoints, timeframe, 89

and policy-relevant differences. Credibility was defined as the extent to which the network meta-90

analysis or indirect comparison accurately or validly answers the question it is designed to 91

answer. To generate the questions for the credibility section, the Task Force relied on the 92

expertise of its members and the scientific literature including the reports of the ISPOR Task 93

Force and other pertinent publications on indirect treatment comparisons [10-15]. Items and 94

suggested wording were also informed by earlier or recent efforts that provided guidance to 95

evidence reviewers [17,18]. These questions were grouped into logical domains: Evidence base 96

used for the indirect comparison or network meta-analysis; Analysis; Reporting Quality & 97

Transparency; Interpretation; and Conflict of interest. 98

The developed questionnaire consists of 6 domains with a total of 26 questions related to the 99

relevance (4 questions) and credibility (22 questions) of a network meta-analysis. Each question 100

can be scored with “Yes”, “No”, “Can’t Answer”. “Can’t Answer” can be used if the item is not 101

reported, there is insufficient information in the report, or the assessor does not have sufficient 102

training to answer the question. For one question, a “No” will imply a “fatal flaw”. A fatal flaw 103

suggests significant opportunities for the findings to be misleading and that the decision maker 104

should use extreme caution in applying the findings to inform decisions. However, the 105

occurrence of a fatal flaw neither prevents a user from completing the questionnaire nor requires 106

the user to judge the evidence as “insufficient” for use in decision making. The presence of a 107

fatal flaw should raise a strong caution and should be carefully considered when the overall body 108

of evidence is reviewed. Based on the number of items scored satisfactory in each of the 6 109

domains, an overall judgment of the strength of each domain was then provided: “Strength,” 110

“Neutral,” “Weakness,” “Fatal flaw.” Based on the domain judgments, the overall relevance 111


7

and credibility of the network meta-analysis for decision-making will be assessed separately by 112

the user as either “Sufficient” or “Insufficient.” 113

Following self-testing by the Task Force members during September/October of 2012 and 114

subsequent modification of the questionnaire, the revised questionnaire was made available to 115

volunteers interested in user testing. Each volunteer was asked to evaluate three published 116

network meta-analysis with the questionnaires being developed during April and May of 2013. 117

Based on the feedback and agreement among the ratings provided the current version of the 118

questionnaire was found helpful in assisting users systematically assess network meta-analysis 119

studies. The questionnaire is provided in Appendix A. 120

121

QUESTIONNAIRE ITEMS 122

This section provides detailed information to facilitate answering the 26 questions. A glossary is 123

provided in Appendix B. 124

Relevance 125

Relevance addresses the extent to which the results of the study apply to the setting of interest to 126

the decision-maker. A relative term, relevance has to be determined by each decision-maker and 127

the respective judgment determined by one decision-maker will not necessarily apply to other 128

decision-makers. 129

1. Is the population relevant? 130

This question addresses whether the populations of the RCTs that form the basis for the network 131

meta-analysis sufficiently match the population of interest to the decision-maker. Relevant 132

population characteristics are not only limited to the specific disease of interest, but also 133

pertinent to disease stage, severity, comorbidities, treatment history, race, age, gender, and 134

possibly other demographics. Typically, RCTs included in the network analysis are identified by 135

means of a systematic literature search with the relevant studies in terms of population pre-136

defined by study selection criteria. If these criteria are reported, this is a good starting point to 137


8

judge relevance in terms of population. Evidence tables with inclusion criteria and baseline 138

patient characteristics for each study provide the most relevant information to judge population 139

relevance; exclusion criteria is also noteworthy. For example, if a decision involves covering a 140

Medicare Part D population (e.g., those greater than 65), studies with few patients above 60 141

years of age may be less relevant. 142

2. Are any relevant interventions missing? 143

This question gets at whether the intervention(s) included in the network meta-analysis matches 144

the one(s) of interest to the decision-maker and whether all relevant comparators been 145

considered. Depending on the included RCTs, the network meta-analysis may include additional 146

interventions not necessarily of interest for the decision-maker. This does not, however, 147

compromise relevance. Things to think about when judging the relevance of included 148

biopharmaceuticals are dose and schedule of a drug, mode of administration, and background 149

treatment. A question whether the drug is used as induction or maintenance treatment can be of 150

relevance as well. For other medical technology, one can consider whether the procedure or 151

technique in the trials is the same as the procedure or technique of interest to the decision-maker. 152

3. Are any relevant outcomes missing? 153

This question asks what outcomes are assessed in the study and whether the outcomes are 154

meaningful to the decision maker. There has been an increasing emphasis on outcomes that are 155

directly meaningful to the patient or the health system such as cardiovascular events (e.g., 156

myocardial infarction, stroke) or patient functioning or health-related quality of life (e.g., SF-36, 157

EQ-5D) and decreasing emphasis on surrogate endpoints (e.g., cholesterol levels ) unless 158

validated. Other considerations include the feasibility of measuring relevant (i.e., final) 159

outcomes, the predictive relationship between surrogate outcomes and final outcomes and what 160

kind of evidence will be considered “good-enough” given the patient population, the burden of 161

the condition, the availability of alternative treatments (and the evidence supporting those 162

treatments). Not only the endpoints themselves are of interest, but also their time points. For 163

example, a network meta-analysis that included RCTs with a longer follow-up may be more 164

relevant to help inform treatment decisions for a chronic disease than a network meta-analysis 165

limited to studies with a short follow-up (if follow-up is related to treatment effect). 166


9

4. Is the context (settings & circumstances) applicable? 167

This question addresses whether there are any differences between the RCTs included in the 168

network meta-analysis and the setting and circumstances the decision-maker is interested in 169

which do not concern the population, interventions, and outcomes, but may still render the 170

findings not applicable. For example, the year when the studies included in the network meta-171

analysis were performed can be of interest when background medical care of a certain disease 172

has dramatically changed over time; as such, the standard treatment may change over time. 173

Many of the RCTs underlying network meta-analysis comparing biopharmaceuticals are often 174

designed for efficacy (can it work) purposes and therefore the setting or circumstances may be 175

different from the real-world setting of interest to the decision-maker. A relevant question to ask, 176

therefore, is whether the relative treatment effects and their rank ordering of interventions as 177

obtained with the network meta-analysis would still hold if real-world compliance or adherence 178

would have been taken into consideration. The answer might be “no” if some of the intervention 179

are associated with a much lower compliance in the real-world setting than other interventions. 180

Credibility 181

Once the network meta-analysis is considered sufficiently relevant, its credibility will be 182

assessed. Credibility is defined as the extent to which the network meta-analysis or indirect 183

comparison accurately or validly answers the question it is designed to answer. For the 184

assessment questionnaire we take the position that credibility is not limited only to internal 185

validity (i.e. the observed treatment effects resulting from the analysis reflect the true treatment 186

effects), but also to reporting quality & transparency, interpretation, and conflict of interest (see 187

Figure 1). The internal validity of the network meta-analysis can be compromised as a result of 188

the presence of bias in the identification and selection of studies, bias in the individual studies 189

included, and bias introduced by the use of inappropriate statistical methods. 190


10

Figure 1: Overview of domains related to assessment of the credibility of a network meta-analysis 191

192


11

Evidence base used for the indirect comparison or network meta-analysis 193

The first 6 questions of the credibility domain pertain to the validity of the evidence base feeding 194

information into the network meta-analysis. 195

1. Did the researchers attempt to identify and include all relevant randomized controlled 196

trials? 197

In order to have a network meta-analysis that reflects the currently available evidence, a 198

systematic literature search needs to be performed. Although there is no guarantee that all 199

relevant studies can be identified with the search, it is important that the researchers at least 200

attempted to achieve this goal. Important questions to ask are: 201

Was the literature search strategy defined in such a way that available randomized 202

controlled trials concerning the relevant interventions could be identified? 203

Were multiple databases searched (e.g. MEDLINE, EMBASE, Cochrane Central Registry 204

of Trials)? 205

Were review selection criteria defined in such a way that relevant randomized controlled 206

trials were included if identified with the literature search? 207

When the answer to each of these questions is “yes”, one may conclude that the researchers 208

attempted to include all available relevant RCTs that have been published as full text reports. As 209

additional assurance that no key studies are missing, a search of clinical trial databases, such as 210

clinicaltrial.gov, is also recommended. 211

2. Do the trials for the interventions of interest form one connected network of randomized 212

controlled trials? 213

In order to allow comparisons of treatment effects of all interventions included in the network 214

meta-analysis with each other, the evidence base used for the network meta-analysis should meet 215

two requirements. First, all of the RCTs include at least one intervention strategy (or placebo) in 216

common with another trial. Second, all of the (indirect) treatment comparisons are part of the 217

same network. In Figure 2 four different evidence networks are presented. The nodes reflect 218

interventions, and the edges reflect one or several RCTs, i.e. direct comparisons. The network in 219


12

Figure 2A includes AB studies comparing intervention B with A and AC studies comparing 220

intervention C with A. The relative treatment effect of C versus B can be obtained with the 221

indirect comparison of the AB and AC studies. The network in Figure 2B also includes CD 222

studies thereby allowing treatment comparisons of interventions A, B, C, and D. The absence of 223

an edge between two nodes in Figure 2A and 2B indicates that the relative treatment effects are 224

obtained by means of an indirect comparison. The networks in Figure 2C and 2D are 225

characterized by a closed loop, which implies that for some of the treatment comparisons there is 226

both direct and indirect evidence. In Figure 2C there is both direct and indirect evidence for the 227

AB, AC, and BC comparisons. In Figure 2D there is direct and indirect evidence for all 228

comparisons with the exception of the AD and BC contrast, for which there is only direct 229

evidence. 230

231

Figure 2: Networks of randomized controlled trials to allow for network meta-analysis and 232

indirect comparisons. All interventions that can be indirectly compared are part of one 233

network. 234

235

236

A B

C D


13

In RCTs the observed outcome with an intervention is the result of study characteristics, patient 237

characteristics, and the treatment itself. In the placebo-controlled trial, the result of the placebo 238

arm reflects the impact of study and patient characteristics on the outcome of interest, say 239

outcome y. (See Figure 3). In other words: the placebo response is the result of all known and 240

unknown prognostic factors other than active treatment. We can call this the study effect. In the 241

active intervention arm of the trial, the observed outcome y is a consequence of the study effect 242

and a treatment effect. By randomly allocating patients to the intervention and placebo group, 243

both known and unknown prognostic factors (as well as both measured and unmeasured 244

prognostic factors) between the different groups within a trial are on average balanced. Hence, 245

the study effect as observed in the placebo arm is assumed to be the same in the active 246

intervention arm and, therefore, the difference between the active intervention arm and placebo 247

arm (say delta y) is attributable to the active intervention itself, resulting in a treatment effect 248

(the blue box in Figure 3). 249

250

Figure 3: Treatment effects, study effects, effect modifiers and prognostic factors in an 251

randomized placebo-controlled trial. 252

253

With an indirect comparison interest centers on the comparison of the treatment effects of 254

interventions that are not studied in a head-to-head fashion. In order to ensure that the indirect 255

comparisons of interventions are not affected by differences in study effects between studies, we 256


14

want to only consider the treatment effects of each trial. This consideration implies that all 257

interventions indirectly compared have to be part of one network of trials where each trial has at 258

least one intervention (or placebo) in common with another trial, as illustrated in Figure 4. If 259

some interventions of interest are not part of the same network, then it is not possible to perform 260

an indirect comparison of treatment effects of these interventions without a substantial risk of 261

bias, as illustrated in Figure 5. 262

263

Figure 4: Indirect comparison of AB, AC and CD studies where each trial is part of one 264

network. In terms of treatment effects, B dominates A, C dominates B, and D dominates C. 265

266


15

Figure 5: AB and CD studies that do not have an intervention in common and for which an 267

indirect comparison is not feasible. 268

269

270

271

272

273


16

3. Is it apparent that poor quality studies were included, thereby leading to bias? 274

The validity of the network meta-analysis is not only at risk when certain studies are not 275

identified, but also when the internal validity of an individual RCTs is compromised. In order to 276

answer this credibility question, the network meta-analysis report should have provided summary 277

information on key study characteristics of each RCT, such as method of randomization, 278

treatment allocation concealment, blinding of the outcome assessor, and drop out. Frequently, a 279

network meta-analysis report provides an overview of bias in individual studies as assessed with 280

a specific checklist for individual study validity, such as the Cochrane Collaboration's tool for 281

assessing risk of bias in randomized trials [19]. 282

4. Is it likely that bias was induced by selective reporting of outcomes in the studies? 283

Outcome reporting bias occurs when outcomes of the individual trials are selected for 284

publication based on their findings [20]. This can result in biased treatment effect estimates 285

obtained with the network meta-analysis. As an assessment of the likelihood of bias due to 286

selective reporting of outcomes, a determination can be made whether there is consistency in the 287

studies used for the network meta-analysis with respect to the different outcomes. In other words, 288

check whether any of the selected studies do not report some of the outcomes of interest and 289

were therefore not included in some of the network meta-analysis of the different endpoints. It 290

can be informative to look at the original publication of the suspect study whether there is any 291

additional information. Furthermore, if reported in the network meta-analysis report, check the 292

reasons why studies were excluded from the systematic review to ensure no eligible studies were 293

excluded only because the outcome of interest was not reported [20]. If this were the case, a 294

related risk of bias – publication bias -- in the overall findings of the network meta-analysis 295

would manifest itself. . 296

5. Are there systematic differences in treatment effect modifiers (i.e. baseline patient or study 297

characteristics that impact the treatment effects) across the different treatment 298

comparisons in the network? 299

Study and patient characteristics can have an impact on the observed outcomes in the 300

intervention and control arms of an RCT. As mentioned previously, study and patient 301


17

characteristics that influence outcome to the same extent in the intervention and the placebo arms 302

are called prognostic factors. More specific, the placebo response or study effect captures the 303

impact of all study and patient characteristics that are prognostic factors on the outcome of 304

interest. 305

Study and patient characteristics that influence the difference between the intervention and the 306

placebo arm regarding the outcome of interest are treatment effect modifiers (Figure 2). For 307

example, if a medical intervention only works for men and not for women, a trial among men 308

will demonstrate a positive treatment effect relative to placebo, whereas a trial among only 309

women would not. Gender is a treatment effect modifier for that intervention. Another example, 310

if the outcome of interest (e.g., improvement in pain) is greater in a 24-week trial than in 12-311

week trial, and there is no difference in the treatment effect of the intervention relative to placebo 312

between the 24-week trial and 12-week trial then trial duration is only a prognostic factor and not 313

an effect modifier. If a variable is both a prognostic factor of the outcome of interest and a 314

treatment effect modifier, then the placebo response (or baseline risk) of a placebo controlled 315

trial is associated with the treatment effect. 316

Although the indirect comparison or network meta-analysis is based on RCTs, randomization 317

does not hold across the set of trials used for the analysis; patients are not randomized to 318

different trials. As a result, there are situations where there are systematic differences in study 319

characteristics or the distribution of patient characteristics across trials. In general, if there is an 320

imbalance in the distribution of the effect-modifiers across the different types of direct 321

comparisons in a network meta-analysis, the corresponding indirect comparisons are biased 322

[21,22]. 323

In Figures 6-8 examples of valid and biased indirect comparisons and network meta-analysis are 324

provided. It is important to acknowledge that there is always some risk of imbalances in 325

unknown or unmeasured effect modifiers between studies evaluating different interventions. 326

Accordingly there is always a small risk of residual confounding bias, even if all observed effect 327

modifiers are balanced across the direct comparisons. 328

329


18

Figure 6: Indirect comparison of an AB and AC study with different proportions of 330

patients with moderate and severe disease. Disease severity is an effect modifier. The 331

indirect comparison for the moderate disease population is valid, as is the indirect 332

comparison for the severe population. The indirect comparison for the overall population is 333

biased because the distribution of the effect-modifier severity is different for the AB and 334

AC studies. 335

336


19

Figure 7: Valid network meta-analysis of AB and AC studies with and without heterogeneity: No imbalance in the distribution 337

of the effect modifier disease severity between AB and AC comparisons. 338

339

340


20

Figure 8: Biased network meta-analysis of AB and AC studies with and without heterogeneity: Imbalance in the distribution 341

of the effect modifier disease severity between AB and AC comparisons. 342

343


21

6. If yes (i.e. there are such systematic differences in treatment effect modifiers), were these 344

imbalances in effect modifiers across the different treatment comparisons identified prior 345

to comparing individual study results? 346

Frequently, there are several trial and patient characteristics that are different across the different 347

direct comparisons. Deciding which covariates are effect modifiers based on observed patterns 348

in the results across trials can lead to false conclusions regarding the sources of inconsistency 349

and biased indirect comparisons [22,23]. It is recommended that researchers undertaking the 350

network meta-analysis first generate a list of potential treatment effect modifiers for the 351

interventions of interest based on prior knowledge or reported subgroup results within individual 352

studies before comparing results between studies. Next, the study and patient characteristics that 353

are determined to be likely effect-modifiers have to be compared across studies to identify any 354

imbalances between the different types of direct comparisons in the network. 355

Analysis 356

The next 13 questions pertain to the statistical methods used for the indirect comparison or 357

network meta-analysis. 358

7. Were statistical methods used that preserve within-study randomization? (No naïve 359

comparisons) 360

In order to acknowledge the randomization of treatment allocation within studies and thereby 361

minimize the risk of bias as much as possible, an indirect treatment comparison or network meta-362

analysis has to be based on the relative treatment effects within RCTs [5-15]. In other words, 363

statistical methods need to be used that preserve within-study randomization. An invalid or naïve 364

indirect comparison of RCTs that does not preserve randomization is presented in Figure 9. The 365

naïve indirect comparison does not take any differences in study effects (as represented with the 366

white boxes representing placebo) across trials into account. With RCTs available that are part of 367

one evidence network, the naïve indirect comparison can be considered a fatal flaw. 368

369


22

Figure 9: Naïve indirect comparison of RCTs that is invalid because differences in study 370

effects are not acknowledged. In this example the difference of C versus B is overestimated. 371

372

373

8. If both direct and indirect comparisons are available for pairwise contrasts (i.e,. closed 374

loops), was agreement in treatment effects (i.e., consistency) evaluated or discussed? 375

If a network has a closed loop, there is both direct and indirect evidence for some treatment 376

contrasts (Figure 10). For example, in an ABC network that consists of AB trials, AC trials, and 377

BC trials, direct evidence for the BC contrast is provided by the BC trials and indirect evidence 378

for the BC contrast by the indirect comparison of AC and AB trials. If there are no systematic 379

differences in treatment effect modifiers across the different direct comparisons that form the 380

loop, then there will be no systematic differences in the direct and indirect estimate for each of 381

the contrasts that are part of the loop [5,11,14,15,22]. Combining direct estimates with indirect 382

estimates is valid and the pooled (i.e. mixed) result will reflect a greater evidence base and one 383

has increased precision with respect to treatment effect. However, if there are systematic 384

differences in effect modifiers across the different direct comparisons of the network loop, the 385

direct estimates will be different from the corresponding indirect estimates and combining these 386

may be inappropriate (Figure 11). Hence it is important that in the presence of a closed loop any 387

direct comparisons are compared with the corresponding indirect comparisons regarding effects 388

size or distribution of treatment effect modifiers. However, statistical tests for inconsistency 389

should not be over interpreted and should include knowledge of the subject matter. 390


23

Figure 10: Consistency between direct and indirect comparison 391

392

393


24

Figure 11: Inconsistency between direct and indirect comparison 394

395

396

397

398

399


25

9. In the presence of consistency between direct and indirect comparisons, were both direct 400

and indirect evidence included in the network meta-analysis? 401

If there is a closed loop in an evidence network, the relative treatment effect estimates obtained 402

with direct comparisons are comparable to those obtained with the corresponding indirect 403

comparisons, and there is no imbalance in the distribution of the effect modifiers, then it is of 404

interest to combine results of direct and indirect comparisons of the same treatment contrast in a 405

network meta-analysis. As a result, the pooled result will be based on a greater evidence base, 406

with increased precision for treatment effect, than when only direct evidence for the comparison 407

of interest is considered [5,6,11,14,15]. 408

10. With inconsistency or an imbalance in the distribution of treatment effect-modifiers across 409

the different types of comparisons in the network of trials, did the researchers attempt to 410

minimize this bias with the analysis? 411

In general, if there is an imbalance in the distribution of the effect-modifiers across the different 412

types of direct comparisons, transitivity does not hold and the corresponding indirect comparison 413

is biased and/or there is inconsistency between direct and indirect evidence (see Figures 8 and 414

11). 415

If there are a sufficient number of studies included in the network meta-analysis, it may be 416

possible to perform a meta-regression analysis where the treatment effect of each study is a 417

function of not only a treatment comparison of that study but also an effect-modifier [23-27]. In 418

other words, with a meta-regression model we estimate the pooled relative treatment effect for a 419

certain comparison based on the available studies, adjusted for differences in the level of the 420

effect modifier between studies. This will allow indirect comparisons of different treatments for 421

the same level of the covariate, assuming that the estimated relationship between effect-modifier 422

and treatment effect is not greatly affected by (ecological) bias (See Figure 12). For example, if 423

there are differences in the proportion of subjects with severe disease among trials and disease 424

severity affects the efficacy of at least one compared interventions, a meta-regression analysis 425

can be used to adjust the indirect comparison estimates for the differences in the proportion of 426

severe disease. In addition, the model can predict indirect treatment comparison estimates for 427

different proportions of patients with severe disease in a population. As an alternative, one can 428


26

use the placebo response of a study as the covariate to adjust for any inconsistency, but this relies 429

on the assumption that study and patient characteristics (such as severity of illness) that are effect 430

modifiers of the relative treatment effect are also prognostic factors of the outcome with placebo 431

[22]. However, this is not a trivial task and requires appropriate model specification. 432

As an alternative to a meta-regression analysis, researchers can also attempt to use models with 433

so called inconsistency factors [27-29]. These network meta-analysis models explicitly 434

acknowledge any difference between direct and indirect relative treatment effect estimates of two 435

interventions and thereby are less prone to bias estimates when direct and indirect estimates are 436

pooled. These models only apply to treatment comparisons that are part of closed loop in an 437

evidence network. 438

In the absence of inconsistency and absence of differences in effect modifiers across different 439

comparisons, this question can be answered with a “yes”. If there are inconsistencies or 440

systematic differences in effect modifiers across comparisons, this question will also be 441

answered with “yes” if statistical models are used that capture the inconsistency, or meta-442

regression models are used which are expected to explain or adjust for the consistency or bias. 443

Of course, one needs to be aware of the risk of model misspecification and ecological bias when 444

using only study level data. In the presence of inconsistency, but the researchers did not attempt 445

to adjust for this, the question needs to be answered with “no”. 446

447


27

Figure 12: Network meta-analysis without and with adjustment for imbalance in effect 448

modifier using a meta-regression model. With this approach the treatment effects for AC 449

and AB depend on the value of the effect modifier. 450

451

452

453

454


28

11. Was a valid rationale provided for the use of random effects or fixed effects models? 455

With multiple studies available for all or some of the direct comparisons in the network, it is 456

possible that there are studies demonstrating clearly different relative treatment effects despite 457

the fact that the same treatments were compared. With differences in relative treatment effects 458

greater than what can be expected due to chance, there is heterogeneity and a random effects 459

model is advocated [30]. The use of random effects models results in the average relative 460

treatment effect across the different levels of the responsible effect modifier. For example, if 461

there are several trials comparing intervention B relative to A and each trial has a different 462

distribution of patients with moderate or severe disease, the effect modifier, then the 463

heterogeneous study results can be pooled with a random effects model. The analysis will 464

provide the average relative treatment effect across the different levels of disease severity. In the 465

absence of heterogeneity, a fixed effect model can be defended. With a fixed effect model it is 466

assumed that the treatment effects are the same across the different trials included in the analysis. 467

A valid rationale for the use of random effects models includes the following: the presence of 468

differences in known or likely treatment effect modifiers between studies comparing the same 469

interventions (e.g. differences in the proportion of patients with severe disease among different 470

trials comparing the same interventions); clear presence of variation in treatment effects among 471

studies beyond what can be expected due to chance; or taking the conservative position that a 472

random effect approach is what is expected in the studies addressing the specific clinical 473

question. . 474

A valid rationale for the use of a fixed effects model instead of random effects model includes a 475

judgment about the similarity of studies according to important effect modifiers and the prior 476

belief, based on experience on the relevant clinical field, that the intervention is likely to have a 477

fixed relative effect irrespectively of the populations studied. Often model fit criteria are used to 478

decide between models; the model with the better trade-off between fit and parsimony (the fixed 479

effects model being the most parsimonious) is chosen. However, such approaches need to be 480

adopted with caution as the statistical tests and measures of heterogeneity typically require a 481

large number of studies to yield reliable results. 482


29

12. If a random effects model was used, were assumptions about heterogeneity explored or 483

discussed? 484

With a random effects model the between-study variation in treatment effects for the direct 485

comparisons are explicitly taken into consideration by estimating a value for this heterogeneity 486

which is subsequently incorporated into the estimation of the summary effects. Since a network 487

meta-analysis includes multiple treatment comparisons, one has the option to either assume that 488

the amount of heterogeneity is the same for every treatment comparison, or different. Frequently, 489

however, it is assumed that the amount of heterogeneity is the same for all comparisons. 490

Exploration or, at least, a discussion of this assumption is recommended. In many cases it is not 491

possible to explore the assumption of different sets of heterogeneity with alternative statistical 492

models due to limited data. If a valid rationale for the use of fixed effects model was provided, 493

answer this question with “yes.” 494

13. If there are indications of heterogeneity, were subgroup analyses or meta-regression 495

analysis with pre-specified covariates performed? 496

Heterogeneity in relative treatment effects (i.e. true variation in relative treatment effects across 497

studies comparing the same interventions) can be captured with random effects models, but the 498

analysis will provide the average relative treatment effect across the different levels of the 499

responsible effect modifier(s). This finding may not be very informative for decision-making, 500

especially if there are great differences in relative treatment effects for the different levels of the 501

effect modifiers [30]. It is more informative to estimate relative treatment effects for the different 502

levels of the effect modifier, either with subgroup analysis or with a meta-regression analysis 503

where treatment effects are modeled as a function of the covariate, as illustrated in Figure 12. 504

Often there are a limited number of trials in a network meta-analysis, but many trial and patient 505

characteristics may be different across studies. Deciding which covariates to include in the 506

meta-regression models used for the network meta-analysis based on observed patterns in the 507

data of the trials can lead to false conclusions regarding the sources of heterogeneity [23,24]. To 508

avoid data-dredging, it is strongly recommended to pre-specify the potential treatment effect 509

modifiers that will be investigated. 510


30

In the absence of meta-regression analysis or subgroup analysis, this question can only be 511

answered with “yes” if there are no indications of heterogeneity and a valid rational for a fixed 512

effects model was provided. 513

Reporting Quality & Transparency 514

The next 6 questions pertain to the transparency of the presentation of the evidence base and the 515

results of the network meta-analysis. With a sufficiently transparent presentation, the credibility 516

of the findings given the available studies can be accurately assessed and, if of interest, 517

replicated. 518

14. Is a graphical or tabular representation of the evidence network provided with information 519

on the number of RCTs per direct comparison 520

To help understand the findings of a network meta-analysis, an overview of the included RCTs is 521

required. The evidence base can be summarized with an evidence network where the available 522

direct comparisons are reflected with edges (i.e. lines) between the different interventions along 523

with the number of RCTs per direct comparison. It is recommended that any trials that compare 524

more than 2 interventions (i.e. trials with more than 2 arms) are highlighted. With such a network 525

it is immediately clear for which treatment contrasts there is direct evidence, indirect evidence, 526

or both [31]. A table where studies are presented in the rows, the interventions in the columns, 527

and observed individual study results with each intervention of each study in the cells is very 528

informative as well. 529

15. Are the individual study results reported? 530

In order to assess the (face) validity of the results of the network meta-analysis, the individual 531

study results need to be provided, either in the publication or in an online supplement. More 532

specific, presentation of the individual study results allows reviewers to compare these with the 533

results of the network meta-analysis. It will also facilitate replication of the analysis, if desired. 534


31

16. Are results of direct comparisons reported separately from results of the indirect 535

comparisons or network meta-analysis? 536

To judge whether the assumption of consistency between direct and indirect evidence holds, 537

estimates of (pooled) direct comparisons can be compared with estimates obtained from the 538

corresponding indirect comparisons. However, this is not a trivial task. A more pragmatic 539

approach is to present (pooled) direct evidence separately from results of the network meta-540

analysis where direct and indirect evidence for some comparisons (i.e., presence of closed loops) 541

are combined. Although the absence of a difference between these two sets of results does not 542

guarantee there is no inconsistency, the opposite does hold. If the results based on direct 543

evidence are systematically different from results based on the combination of direct and indirect 544

evidence then the indirect evidence has to be inconsistent with the direct evidence. 545

17. Are all pairwise contrasts between interventions as obtained with the network meta-546

analysis reported along with measures of uncertainty? 547

With a network meta-analysis, relative treatment effect estimates between all the interventions 548

included in the analysis can be obtained. For decision-making it is very informative when all 549

these possible contrasts are presented. Equally important, for every relative treatment effect that 550

is estimated, measures of uncertainty need to be presented (i.e., 95% confidence intervals or 95% 551

credible intervals, which will be defined in the next question). 552

18. Is a ranking of interventions provided given the reported treatment effects and its 553

uncertainty by outcome? 554

A network meta-analysis can be performed in a frequentist or a Bayesian framework. The result 555

of a frequentist network-meta-analysis comparing treatments is an estimate of the relative 556

treatment effect along with a p-value and 95% confidence interval. The p-value indicates 557

whether the results are statistically ‘significant’ or ‘non-significant’. The p-value reflects the 558

probability of having a test statistic at least as extreme as the one that was actually observed 559

assuming that the null hypotheses (e.g. no difference between treatments) are true [32]. This is 560

not very useful for decision-making because it does not provide information on the probability 561

that a hypothesis (e.g. one treatment is better than the other) is true or false. Moreover, when the 562


32

decision-maker is faced with a choice between more than two treatments, the interpretation of a p 563

value associated with each pairwise comparison in a network meta-analysis becomes even more 564

difficult to interpret because it does not provide information about the ranking of treatments. The 565

95% confidence intervals corresponding to the effect estimate as obtained within a frequentist 566

framework cannot be interpreted in terms of probabilities either; the 95% confidence interval 567

does not mean there is a 95% probability that the true value is between the boundaries of the 568

interval. 569

Within the Bayesian framework, the treatment effects to be estimated are considered random 570

variables [33]. The belief regarding the treatment effect size before looking at data can be 571

summarized with a probability distribution: the prior distribution. This probability distribution 572

will be updated after having observed the data, resulting in the posterior distribution 573

summarizing the updated belief regarding the likely values for this effect size. The output of the 574

Bayesian network meta-analysis is a complex posterior distribution of all relative treatment 575

effects between interventions included in the network. The posterior distribution for each relative 576

treatment effect can be summarized with a mean or median to reflect the most likely value for 577

the effect size, as well as the 2.5th and 97.5th percentile: the 95% credible interval. In figure 13 578

the output obtained with a frequentist and Bayesian network meta-analysis is summarized. In the 579

frequentist framework we obtain relative treatment effects of each intervention relative to a 580

control along with a 95% confidence interval (and p-value). In the Bayesian framework we 581

obtain posterior probability distributions summarizing the likely values for the treatment effect of 582

each intervention relative to a control. 583

584

585


33

Figure 13: Frequentist versus Bayesian output of a network meta-analysis 586

587

588


34

Since the posterior distribution is a probability distribution, it allows for probability statements. 589

Unlike the 95% confidence interval obtained with the frequentist framework, the 95% credible 590

interval can be interpreted in terms of probabilities: there is a 95% chance that the true effect size 591

falls between the boundaries of the credible interval. Consequently, in a network meta-analysis 592

fitted within a Bayesian framework, the multiple inferences based on confidence intervals or p-593

values can be replaced with probability statements (See Figure 14). For example, “ there is x% 594

probability that treatment C is better than B”, or “there is a y% probability that treatment D is the 595

most efficacious out of treatment A, B, C, D, and E regarding this outcome”, or “there is z% 596

probability that intervention E is the least efficacious [34]. 597

The probability that each treatment ranks 1st, 2nd, 3rd, and so on out of all interventions 598

compared can be called rank probabilities and are based on the location, spread and overlap of 599

the posterior distributions of the relative treatment effects. Rank probabilities can be summarized 600

with a graph where on the horizontal axis the rank from 1 to the number of treatments in the 601

analysis is provided, and the vertical axis reflects a probability. Now, for each treatment the 602

probability against the rank is plotted and these dots connected by treatment: a rankogram, as 603

illustrated in Figure 15 [28]. Note that solely presenting the probability of being the best can 604

result in erroneous conclusions regarding the relative ranking of treatments because interventions 605

for which there is a lot of uncertainty (i.e. wide credible intervals) are more likely to be ranked 606

best. The benefit of having ranking probabilities is that these ‘summarize’ the distribution of 607

effects, thereby acknowledging both location and uncertainty. 608

Technically, ranking probabilities can be obtained when a network meta-analysis has been 609

performed in a frequentist framework by using resampling techniques (such as bootstrapping). 610

However, this implies a Bayesian interpretation of the frequentist analysis. 611

612

613


35

Figure 14: Probabilistic interpretation of posterior distribution with Bayesian framework 614

615

616

Figure 15: ‘Rankograms’ showing the probability for each treatment to be at a specific 617

rank in the treatment hierarchy 618

619


36

19. Is the impact of important patient characteristics on treatment effects reported? 620

If it has been determined that certain patient characteristics are effect modifiers and differ across 621

studies, then it is of interest to report relative treatment effects for different levels of the effect 622

modifier as obtained with meta-regression analysis or subgroup analysis. Factors of interest can 623

include gender, severity of disease, distribution of biomarkers and treatment history, for 624

example. 625

626

Interpretation 627

20. Are the conclusions fair and balanced? 628

If the conclusions are in line with reported results of the network meta-analysis, the available 629

evidence base, credibility of the analysis methods, and any concerns of bias, then the conclusions 630

can be considered to be fair and balanced. 631

632

Conflict of Interest 633

21. Were there any potential conflicts of interest? 634

Conflicts of interest may exist when an author (or author's institution or employer) has financial 635

or personal relationships or affiliations that could influence (or bias) the author's decisions, work, 636

or manuscript. 637

22. If yes, were steps taken to address these? 638

In order to address potential conflicts of interest, all aspects of conflicts of interest should be 639

noted (including the specific type and relationship of the conflict of interest), and the publication 640

should be peer-reviewed. The contribution of each author should be clearly noted to document 641

full disclosure of activities. Also, a fair and balanced exposition, including the breadth and depth 642

of the study’s limitations, should be accurately discussed. 643


37

DISCUSSION 644

The objective was to develop a questionnaire to assess the relevance and credibility of network 645

meta-analysis to help inform healthcare decision-making. Relevance has to do with the extent to 646

which the network meta-analysis is applicable to the problem faced by the decision-maker. 647

Credibility has to do with the extent the findings of the network meta-analysis are valid and 648

trustworthy. The questionnaire also has an educational purpose: to raise awareness that evidence 649

evaluation can be challenging and important elements may not obvious to all potential users. 650

The developed questionnaire assists evidence evaluators in applying a structured and consistent 651

approach. The user will provide his or her impression of the overall relevance and credibility of 652

each network meta-analysis assessed. The questionnaire does not determine an overall 653

impression or summary score. Although some may be interested in such scores to facilitate 654

overall evidence synthesis, the use of such scores can be misleading. In addition, the 655

applicability of a study may depend on whether there is any other evidence that addresses the 656

specific issue or the decision being made. In general, an evidence evaluator needs to be aware of 657

the strengths and weaknesses of each piece of evidence and apply his or her own reasoning. 658

User testing 659

The Taskforce wanted to strike a balance between simplicity and comprehensiveness to ensure a 660

questionnaire useful for the end user who is not necessarily a methodologist. Approximately 22 661

persons were solicited to participate in user testing. Each volunteer was asked to evaluate three 662

published network meta-analysis, rated by the Task Force as respectively “good quality”, 663

“medium quality”, and “poor quality”. The response was 82%. There were not enough users to 664

perform a formal psychometric evaluation of the questionnaire. However, some interesting 665

descriptive observations could be made. Multi-rater agreement, calculated as the percentage 666

agreement over the response categories “yes”, “no”, “not applicable”, “not reported” and 667

“insufficient information” for each credibility question, exceeded 80% for 8 of the 22 questions. 668

The average agreement score was 72% with a range of 42%-91%. The lowest agreement scores 669

were observed for the credibility questions 5, 6, 10, 13 and 19 relating to the key concepts 670

“effect-modifiers”, “inconsistency” and “heterogeneity”. Agreement in responses was greater for 671

the well reported and good quality study as determined by the Task Force. Missing ratings across 672


38

the 4 credibility domains varied between 6% and 17%. Based upon the answer to the overall 673

summary question: is the credibility “Sufficient” or “Insufficient”? -- there was 83% agreement 674

among the ratings provided. The good quality study was generally rated as sufficient with respect 675

to relevance and credibility, while the poor study was generally rated not sufficient. Based on 676

these findings it can be concluded that the questionnaire in its current form can be helpful for 677

decision-making, but there is a clear need for education. Over time refinement of the 678

questionnaire are expected to be made. 679

Educational needs 680

Across many international jurisdictions, the resources and expertise available to inform health 681

care decision makers varies widely. While there is broad experience in evaluating evidence from 682

randomized controlled clinical trials, there is less experience with network meta-analysis among 683

decision-makers. ISPOR has provided Good Research Practice recommendations on network 684

meta-analysis [14, 15]. This questionnaire is an extension of those recommendations and serves 685

as a platform to assist the decision maker in understanding what a systematic evaluation of this 686

research requires. By understanding what a systematic structured approach to the appraisal of 687

this research entails, it is hoped that this will lead to a general increase in sophistication by 688

decision makers in the use of this evidence. To that end, we anticipate additional educational 689

efforts and promotion of this questionnaire will be developed and made available. In addition, an 690

interactive (i.e., web-based) questionnaire would facilitate uptake and support the educational 691

goal that the questionnaire provides. 692

Conclusion 693

The Task Force developed a consensus-based questionnaire to help decision makers assess the 694

relevance and credibility of meta-analysis to help inform healthcare decision-making. The 695

questionnaire aims to provide a guide for assessing the degree of confidence that should be 696

placed in a network meta-analysis, and enables decision-makers to gain awareness of the 697

subtleties involved in evaluating these kinds of studies. It is anticipated that user feedback will 698

permit periodic evaluation and modification to the questionnaires. The goal is to make these 699

questionnaires as useful as possible to the health care decision making community. 700


39

REFERENCES 701

1. Berger ML, et al. A Prospective Observational Study Questionnaire to Assess Study 702

Relevance and Credibility to Inform Healthcare Decision-Making: An ISPOR-AMCP-703

NPC Good Practice Task Force Report. Value in Health. 17:1 704

2. Martin B, et al. A Retrospective Observational Study Questionnaire to Assess Study 705

Relevance and Credibility to Inform Healthcare Decision-Making: An ISPOR-AMCP-706

NPC Good Practice Task Force Report. Value in Health. 17:1 707

3. Caro JJ, et al. A Modeling Study Questionnaire to Assess Study Relevance and 708

Credibility to Inform Healthcare Decision-Making: An ISPOR-AMCP-NPC Good 709

Practice Task Force Report. Value in Health. 17:1 710

4. Higgins JPT, Green S, eds. Cochrane Handbook for Systematic Reviews of Interventions 711

Version 5.0.2 [updated September 2009]. The Cochrane Collaboration, 2009. Available 712

from http://www.cochrane-handbook.org. [Accessed Feb 11, 2010]. 713

5. Caldwell DM, Ades AE, Higgins JPT. Simultaneous comparison of multiple treatments: 714

combining direct and indirect evidence. BMJ 2005;331:897-900. 715

6. Ioannidis JPA. Indirect comparisons: the mesh and mess of clinical trials. Lancet 716

2006;368:1470-2. 717

7. Sutton A, Ades AE, Cooper N, Abrams K. Use of indirect and mixed treatment 718

comparisons for technology assessment. Pharmacoecnomics 2008;26:753-67. 719

8. Wells GA, Sultan SA, Chen L, et al. Indirect Evidence: Indirect Treatment Comparisons 720

in Meta-Analysis. Ottawa: Canadian Agency for Drugs and Technologies in Health; 721

2009. 722

9. Bucher HC, Guyatt GH, Griffith LE, Walter SD. The results of direct and indirect 723

treatment comparisons in meta-analysis of randomized controlled trials. J Clin Epidemiol 724

1997;50:683-91. 725

10. Song F, Altman DG, Glenny A, Deeks JJ. Validity of indirect comparison for estimating 726

efficacy of competing interventions: empirical evidence from published meta-analyses. 727

BMJ 2003;326:472. 728

11. Lu G, Ades AE. Combination of direct and indirect evidence in mixed treatment 729

comparisons. Stat Med 2004;23:3105-24. 730


40

12. Song F, Loke YK, Walsh T, et al. Methodological problems in the use of indirect 731

comparisons for evaluating healthcare interventions: Survey of published systematic 732

reviews. BMJ 2009;338:b1147. 733

13. Dias S, Sutton AJ, Ades AE, Welton NJ. Evidence Synthesis for Decision Making 2: A 734

Generalized Linear Modeling Framework for Pairwise and Network Meta-analysis of 735

Randomized Controlled Trials. Med Decis Making. 2013;33:607-17 736

14. Jansen JP, Fleurence R, Devine B, Itzler R, Barrett A, Hawkins N, Lee K, Boersma C, 737

Annemans L, Cappelleri JC, Interpreting Indirect Treatment Comparisons & Network 738

Meta-Analysis for Health Care Decision-making: Report of the ISPOR Task Force on 739

Good Research Practices - Part 1. Value in Health 2011;14: 417-28 740

15. Hoaglin DC, Hawkins N, Jansen JP, Scott DA, Itzler R, Cappelleri JC, Boersma C, 741

Thompson D, Larholt KM, Diaz M, Barret A. Conducting Indirect Treatment 742

Comparison and Network Meta-Analysis Studies: Report of the ISPOR Task Force on 743

Indirect Treatment Comparisons Good Research Practices - Part 2. Value in Health 744

2011;14:429-37 745

16. Jüni P, Witschi A, Bloch R, Egger M. The Hazards of Scoring the Quality of Clinical 746

Trials for Meta-Analysis. JAMA 1995; 282:1054–60 747

17. Ades AE, Caldwell DM, Reken S, Welton NJ, Sutton AJ, Dias S. Evidence Synthesis for 748

Decision Making 7: A Reviewer's Checklist.Med Decis Making. 2013;33:679-91. 749

18. Donegan S, Williamson P, Gamble C, Tudur-Smith C. Indirect comparisons: a review of 750

reporting and methodological quality. PLoS One. 2010;5:e11054. 751

19. Higgins JP, Altman DG, Gøtzsche PC, Jüni P, Moher D, Oxman AD, Savovic J, Schulz 752

KF, Weeks L, Sterne JA; Cochrane Bias Methods Group; Cochrane Statistical Methods 753

Group The Cochrane Collaboration's tool for assessing risk of bias in randomised trials. 754

BMJ. 2011 Oct 18;343: 755

20. Dwan K, Gamble C, Kolamunnage-Dona R, Mohammed S, Powell C, Williamson PR. 756

Assessing the potential for outcome reporting bias in a review: a tutorial. Trials. 757

2010;11:52. 758

21. Jansen JP, Crawford B, Bergman G, Stam W. Bayesian meta-analysis of multiple 759

treatment comparisons: an introduction to mixed treatment comparisons. Value Health. 760

2008;11:956-64. 761


41

22. Jansen JP, Schmid CH, Salanti G. Direct acyclic graphs can help understand bias in 762

indirect and mixed treatment comparisons. J Clin Epidemiology 2012;65:798-807. 763

23. Thompson SG, Higgins JP. How should meta-regression analyses be undertaken and 764

interpreted? Stat Med. 2002;21:1559-73. 765

24. Schmid CH, Stark PC, Berlin JA, Landais P, Lau J. Meta-regression detected associations 766

between heterogeneous treatment effects and study-level, but not patient-level, factors. J 767

Clin Epidemiol 2004; 57:683-97. 768

25. Cooper NJ, Sutton AJ, Morris D, et al. Addressing between-study heterogeneity and 769

inconsistency in mixed treatment comparisons: application to stroke prevention 770

treatments in individuals with non-rheumatic atrial fibrillation. Stat Med 2009;28:1861–771

81. 772

26. Salanti G, Marinho V, Higgins JP. A case study of multiple-treatments meta-analysis 773

demonstrates that covariates should be considered. J Clin Epidemiol. 2009;62:857-64. 774

27. Dias S, Welton N, Marinho VCC, Salanti G, Ades AE. Estimation and adjustment of 775

Bias in randomised evidence using Mixed Treatment Comparison Meta-analysis. Journal 776

of the Royal Stat. Soc. (A) 2010;173: 613-29 777

28. Lu G, Ades AE. Assessing evidence inconsistency in mixed treatment comparisons. J Am 778

Stat Assoc 2006;101:447-59. 779

29. Dias S, Welton NJ, Caldwell DM, Ades AE. Checking consistency in mixed treatment 780

comparison meta-analysis. Statistics in Medicine 2010;29:932–44. 781

30. Ades AE, Lu G, Higgins JP. The interpretation of random-effects meta-analysis in 782

decision models. Med Dec Making 2005; 25:646-54 783

31. Salanti G, Kavvoura FK, Ioannidis JPA. Exploring the geometry of treatment networks. 784

Ann Intern Med 2008(a);148:544-53. 785

32. Goodman, SN. Towards evidence based medical statistics: 1. The P value fallacy. Ann 786

Intern Med 1999;120:995-1004. 787

33. Gelman AB, Carlin JS, Stern HS, Rubin DB. Bayesian Data Analysis. Boca Raton: 788

Chapman and Hall–CRC; 1995. 789

34. Salanti G, Ades AE, Ioannidis JP. Graphical methods and numerical summaries for 790

presenting results from multiple-treatment meta-analysis: an overview and tutorial. J Clin 791

Epidemiol. 2011;64:163-71. 792


42

793


43

APPENDIX A: QUESTIONNAIRE TO ASSESS THE RELEVANCE AND 794

CREDIBILITY OF A NETWORK META-ANALYSIS 795

The questionnaire consists of 26 questions related to the relevance and credibility of an indirect 796

treatment comparison and network meta-analysis. 797

Relevance questions relate to usefulness of the network meta-analysis to inform health care 798

decision making. Each question will be scored with Yes / No / Can’t Answer. Based on the 799

scoring of the individual questions, the overall relevance of the network meta-analysis needs to 800

be judged as Sufficient or Insufficient. 801

If the network meta-analysis is considered sufficiently relevant, the credibility is going to be 802

assessed. The credibility is captured with questions in the following 5 domains, Evidence base 803

used for the indirect comparison or network meta-analysis, Analysis, Reporting Quality & 804

Transparency, Interpretation, and Conflict of interest. Each question will be scored with Yes / No 805

/ Can’t Answer. Based on the number of questions scored satisfactory in each domain, an overall 806

judgment of the strength of each domain needs to be provided: Strength / Neutral / Weakness / 807

Fatal flaw. If any one of the items is scored as a no resulting in a fatal flaw, the overall domain 808

will be scored as a fatal flaw and the network meta-analysis may have serious validity issues. 809

Based on the domain judgments, the overall credibility of the network meta-analysis will be 810

judged as Sufficient or Insufficient. 811

812


44

Relevance 813

Strength Weakness Can’t answer

Not

repor-

ted

Insufficient

information

Other

reason*

1 Is the population relevant? Yes

No

2 Are any critical interventions missing? No

Yes

3 Are any relevant outcomes missing? No Yes

4 Is the context (e.g., settings and

circumstances) applicable to your

population?

Yes

No

Comments

* Other reasons can include insufficient training of the assessor. Please specify in the comments section. 814 815 816 817 818

Is the relevance of the network meta-analysis sufficient or insufficient for your 819

decision-problem? 820

821


45

Credibility 822

Evidence base used for the indirect comparison or network meta-analysis 823


Not

repor-

ted

Insuff.

infor-

mation

Other

*

1 Did the researchers attempt to identify and

include all relevant randomized controlled

trials?**

Yes No

2 Do the trials for the interventions of interest

form one connected network of randomized

controlled trials?

Yes No

3 Is it apparent that poor quality studies were

included thereby leading to bias?

No Yes

4 Is it likely that bias was induced by selective

reporting of outcomes in the studies?

No Yes

5 Are there systematic differences in treatment

effect modifiers (i.e. baseline patient or study

characteristics that impact the treatment

effects) across the different treatment

comparisons in the network?

No Yes

6 If yes (i.e. there are such systematic differences

in treatment effect modifiers), were these

imbalances in effect modifiers across the

different treatment comparisons identified prior

to comparing individual study results?

Yes No Not

applica

ble

Overall judgment (Strength / Neutral / Weakness)

* Other reasons can include insufficient training of the assessor. 824 ** To help answer this specific item, one can think of the following sub-questions: 825 - Was the literature search strategy defined in such a way that available randomized controlled trials concerning 826

the relevant interventions could be identified? 827 - Were multiple databases searched (e.g,. MEDLINE, EMBASE, Cochrane)? 828 - Were review selection criteria defined in such a way that relevant randomized controlled trials were included 829

(if identified with the literature search)? 830 831


46

Analysis 832


Not

repor-

ted

Insuff.

infor-

mation

Other

*

7 Were statistical methods used that preserve

within-study randomization? (No naïve

comparisons)

Yes No ->

Fatal flaw

8 If both direct and indirect comparisons are

available for pairwise contrasts (i.e. closed

loops), was agreement in treatment effects (i.e.

consistency) evaluated or discussed?

Yes No Not

applica

ble

9 In the presence of consistency between direct

and indirect comparisons, were both direct and

indirect evidence included in the network

meta-analysis?

Yes No Not

applica

ble

10 With inconsistency or an imbalance in the

distribution of treatment effect-modifiers

across the different types of comparisons in the

network of trials, did the researchers attempt to

minimize this bias with the analysis? **

Yes No

11 Was a valid rationale provided for the use of

random effects or fixed effects models?

Yes No

12 If a random effects model was used, were

assumptions about heterogeneity explored or

discussed? ***

Yes No

13 If there are indications of heterogeneity, were

subgroup analyses or meta-regression analysis

with pre-specified covariates performed? ***

Yes No

Overall judgment (Strength / Neutral / Weakness/ Fatal flaw)

* Other reasons can include insufficient training of the assessor. 833 ** In the absence of inconsistency and absence of differences in effect modifiers across comparisons, this item is 834 scored “yes.” If there are inconsistencies or systematic differences in effect modifiers across comparisons, this item 835 will be scored “yes” (if models are used that capture the inconsistency, or meta-regression models are used which 836 are expected to explain or adjust for consistency/bias). *** If a valid rationale for the fixed effects model was 837 provided, state “yes.” 838


47

839


48

Reporting Quality & Transparency 840


Not

repor-

ted

Insufficient

information

Other

reason*

14 Is a graphical or tabular representation of the

evidence network provided with information

on the number of RCTs per direct

comparison?

Yes No

15 Are the individual study results reported? Yes No

16 Are results of direct comparisons reported

separately from results of the indirect

comparisons or network meta-analysis?

Yes No

17 Are all pairwise contrasts between

interventions as obtained with the network

meta-analysis reported along with measures of

uncertainty?

Yes No

18 Is a ranking of interventions provided given

the reported treatment effects and its

uncertainty by outcome?

Yes No

19 Is the impact of important patient

characteristics on treatment effects reported?

Yes No


* Other reasons can include insufficient training of the assessor. 841 842 843

844

845

846


49

Interpretation 847


Not

repor-

ted

Insufficient

information

Other

reason*

20 Are the conclusions fair and balanced? Yes No


* Other reasons can include insufficient training of the assessor. 848 849

Conflict of Interest 850


Not

repor-

ted

Insufficient

information

Other

reason*

21 Were there any potential conflicts of

interest?**

No Yes

22 If yes, were steps taken to address these?*** Yes No


* Other reasons can include insufficient training of the assessor. 851 852 ** Conflicts of interest may exist when an author (or author's institution or employer) has financial or personal 853 relationships or affiliations that could influence (or bias) the author's decisions, work, or manuscript. 854 855 *** To help answer this specific item, one can think of the following sub-question: 856

857 In order to address potential conflicts of interest, all aspects should be noted, including the specific type and 858 relationship of the conflict of interest, and the publication should be peer-reviewed. The contribution of each author 859 should be clearly noted to document full disclosure of activities. Also, a fair and balanced exposition, including the 860 breadth and depth of the study’s limitations, should be accurately discussed. 861 862 863

Is the credibility of the network meta-analysis sufficient or insufficient? 864

865


50

APPENDIX B: GLOSSARY 866 Term Definition

Closed loop

A network of linked RCTs that is characterized by both direct and indirect

evidence for a particular comparison.

For example, when in addition to AB and AC studies the evidence base also

consists of BC studies, there is both direct and indirect evidence for each of

the three pairwise comparisons (AB, AC, BC).

(A network of only AB and AC trials does not have a closed loop; there is

only indirect evidence for the BC comparison.)

Confounding

Confounding bias in indirect comparisons or network meta-analysis implies

that the estimated relative treatment effect between two interventions obtained

by indirectly comparing results from different trial is affected by differences in

the distribution of effect modifiers between these trials. For example, if the

AB trial is performed in a population with 70% severe disease and 30%

moderate disease and the AC trial is performed in a population with 30%

severe disease and 70% moderate disease -- and it is known that severity of

disease influences the efficacy of C or B or both -- then the indirect

comparison of C versus B is affected by confounding bias.

Note that in observational, non-randomized head-to-head studies we get

confounding bias if there is a difference in prognostic factors or effect-

modifiers or both. In indirect comparisons of RCTs we only get confounding

bias if there are only imbalances in the effect modifiers between trials indirect

compared.

Network of AB and AC randomized

controlled trials: Not closed loop.

Network of AB, AC and BC randomized

controlled trials: Closed loop


51

Term Definition

Consistency

There is consistency when the treatment effects from head-to-head

comparisons are in agreement with treatment effects obtained from indirect

comparisons. Consistency between direct and indirect estimated effects is the

manifestation of transitivity. Transitivity refers to the case when there are no

systematic differences in treatment effect modifiers between the studies that

provide indirect evidence and the group of studies that provide direct

evidence.

Contrast

A pairwise treatment comparison of interest that can be possibly estimated

based on direct evidence, indirect evidence, or both.

For example, the BC contrast (i.e. comparison) can be estimated with BC trials

(direct evidence), AB combined with AC trials (indirect evidence), or both.

Decision set A set of randomized controlled trials for which at least one of the arms of each

trial concerns an intervention of interest from a decision-making perspective.

Direct comparison / direct evidence The evaluation of two or more treatment alternatives in a randomized

(placebo) controlled trial.

Effect-modifier

A study or patient characteristic that influences the true treatment effect of one

intervention relative to an active comparator or to a placebo.

For example, the relative risk of an adverse event with treatment B relative to

A is greater among men than women. Gender is the effect modifier.

Effect modifiers cause heterogeneity. If there is variation in effect-modifiers

across subgroup or studies, there will be true variation in treatment effects

among these subgroups or studies comparing the same interventions.

Effect-modifiers can cause inconsistency in network meta-analysis. If there

there is a real imbalance in the distribution of effect modifiers between studies

comparing different sets of interventions, the corresponding indirect

comparison is biased or invalid.

Evidence network One network of randomized controlled trials in which each trial has at least

one intervention in common with another trial.


52

Term Definition

Evidence set

An evidence network that includes all studies that provide either direct

evidence, indirect evidence, or both for the interventions of interest. This set

can include trials that do not include interventions of interest but provide a link

between trials evaluating interventions of interest.

For example, if the interventions B and D are of interest for a decision-maker

but are studied only in AB trials and CD trials, then AC trials provide relevant

evidence to facilitate this comparison.

Fixed effects model

A meta-analysis model to combine treatment effects from different studies

based on the assumption that all studies involving the same treatments aim to

estimate the same treatment effect. Observed variation in treatment effects for

a particular type of comparison (e.g., AB) is simply due to chance. (See Figure

7, example without heterogeneity.)

Heterogeneity

Variation in treatment effects (across trials) beyond what can be expected

based due to chance. Studies aim to estimate different, possibly context-

specific treatment effects. There are systematic differences in known or

unknown study or patient characteristics across studies that influence the

treatment effects.

Moderate; age>50

Moderate; age>40

Severe; age>50

Severe; age>40

0

A B

C D

4 AB studies with variation in true treatment effects:

heterogeneity. This is caused by variation in severity (not but in

age) across studies that influences the treatment effect


53

Term Definition

Indirect comparison / indirect

evidence

A comparison of interventions that were studied in different trials. For

example, the estimated treatment effects (e.g. odds ratios) of AB studies and

AC studies can be indirectly compared to get an estimate for a BC

comparison. (See Figure 4).

Measures of uncertainty 95% confidence intervals, standard errors, p-values, or 95% credible intervals.

Meta-regression

A meta-analysis model to combine treatment effects from different studies that

aims to explain heterogeneity or reduce inconsistency by explicitly capturing

the impact of study or patient characteristics on treatment effects.

Mixed treatment comparisons Combination of treatment effect estimates of both direct and indirect evidence

for a particular treatment comparison.

Naïve comparison

Indirect comparison where differences in study effects (or placebo effects) are

ignored, which is not recommended. For example, the response with drug B

from an AB trial is compared with the response with drug C from an AC trial,

ignoring differences in response with treatment A.

Network meta-analysis

A statistical method to combine evidence regarding more than two

interventions from different RCTs that are part of one evidence network.

If the available evidence base consists of a network of RCTs involving direct

and indirect treatment comparisons, it can be synthesized with a network

meta-analysis.

Outcome An endpoint to measure the effect of a treatment and can be an efficacy or a

safety realization.

Prior distribution

Within the Bayesian framework, treatment effects are considered random

variables. We are interested in how likely the value for the effect is given the

data observed. Since treatment effects to be estimated are considered random

variables in the Bayesian framework, the belief regarding the possible values

of the treatment effect before looking at data obtained in a study can be

summarized with a probability distribution: the prior distribution. This

probability distribution will be updated after having observed the data,


54

Term Definition

resulting in the posterior distribution summarizing the updated probabilities of

the possible values for the treatment effect.

Posterior distribution

The probability distribution obtained with a Bayesian analysis that summarizes

the probability of the possible values for the treatment effects we are interested

in. (See Prior distribution for more information.)

Prognostic factor Patient or study characteristic that has an equal impact on the outcome of

interest in the intervention and control arm of a randomized control trial.

Random effects model

A statistical model to combine treatment effects from different studies based

on the assumption that there is variation in the true treatment effects (i.e.,

heterogeneity) across studies for a particular treatment comparison (e.g. , AB).

Study effect

True outcome in a study if there is (or would have been) no intervention. The

study effect reflects everything that is going on in a study other than the

(active) intervention(s).

Transitivity

If treatment B is more efficacious than treatment A and treatment C is more

efficacious than treatment B then treatment C has to be more efficacious than

A. Transitivity is the key underlying assumption for indirect treatment

comparisons and network meta-analysis. The implication of the transitivity

assumption is that we assume there are no systematic difference in treatment

effect modifiers between studies comparing different interventions (e.g. the

distribution of effect modifiers of AB studies, AC studies and BC studies are

similar). Transitivity will show as agreement in the relative treatment effects

obtained from direct and indirect evidence.

Treatment effect Difference in outcomes between an active intervention and a control compared

in a randomized controlled trial.

Within-study randomization

Randomization of patients to the alternative interventions compared in a

randomized controlled trial. As a result, the treatment effect (i.e., the

difference in the outcome of interest between the interventions compared) is

considered not affected by confounding bias in expectation.

867

indirect treatment comparison / network … reporting quality & transparency, interpretation of...

Documents