ahrq annual meeting 2009: "research to reform: achieving health system change" september...

90
RATING THE EVIDENCE: USING GRADE TO DEVELOP CLINICAL PRACTICE GUIDELINES AHRQ Annual Meeting 2009: "Research to Reform: Achieving Health System Change" September 14, 2009 Yngve Falck-Ytter, M.D. Case Western Reserve University, Cleveland, Ohio Holger Schünemann, M.D., Ph.D. Chair, Department of Clinical Epidemiology & Biostatistics Michael Gent Chair in Healthcare Research McMaster University, Hamilton, Canada

Upload: daniella-johnson

Post on 28-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

RATING THE EVIDENCE: USING GRADE TO DEVELOP CLINICAL PRACTICE GUIDELINES

AHRQ Annual Meeting 2009:"Research to Reform: Achieving Health System Change"

September 14, 2009

Yngve Falck-Ytter, M.D.Case Western Reserve University, Cleveland, Ohio

Holger Schünemann, M.D., Ph.D. Chair, Department of Clinical Epidemiology & Biostatistics

Michael Gent Chair in Healthcare ResearchMcMaster University, Hamilton, Canada

Disclosure

In the past 5 years, Dr. Falck-Ytter received no

personal payments for services from industry. His

research group received research grants from

Three Rivers, Valeant and Roche that were

deposited into non-profit research accounts. He is a

member of the GRADE working group which has

received funding from various governmental

entities in the US and Europe. Some of the GRADE

work he has done is supported in part by grant # 1

R13 HS016880-01 from the Agency for Healthcare

Research and Quality (AHRQ).

Content

Part 1 IntroductionPart 2 Why revisiting guideline

methodology?Part 3 The GRADE approach

Quality of evidence

Part 4 The GRADE approach

Strength of recommendations

Q to audience

Involved in giving recommendations? Using any form of grading system?

Familiarity with GRADE: Heard about GRADE before this

conference? Read a GRADE article published by the

GRADE working group? Attended a GRADE presentation? Attended a hands-on GRADE workshop?

Reassessment of clinical practice guidelines

Editorial by Shaneyfelt and Centor (JAMA 2009) “Too many current guidelines have

become marketing and opinion-based pieces…”

“AHA CPG: 48% of recommendations are based on level C = expert opinion…”

“…clinicians do not use CPG […] greater concern […] some CPG are turned into performance measures…”

“Time has come for CPG development to again be centralized, e.g., AHQR…”

Evidence-based clinical decisions

Research evidence

Patient values and preferences

Clinical state and circumstances

Expertise

Equal for allHaynes et al. 2002

Oxford Centre of Evidence Based Medicine; http://www.cebm.net 7

Before GRADE

Level of evidence

I

II

III

IV

V

Source of evidence

SR, RCTs

Cohort studies

Case-control studies

Case series

Expert opinion

A

Grades of recomend.

B

C

D

Where GRADE fits inPrioritize problems, establish panel

Systematic review

Searches, selection of studies, data collection and analysis

Assess the relative importance of outcomes

Prepare evidence profile: Quality of evidence for each outcome and summary

of findingsAssess overall quality of evidence

Decide direction and strength of recommendation

Draft guideline

Consult with stakeholders and / or external peer reviewer

Disseminate guideline

Implement the guideline and evaluate

GR

AD

E

GRADE – WHY REVISITING GUIDELINE METHODOLOGY?

10

Disclosure

Dr. Schünemann receives no personal payments for service from the pharmaceutical industry. The research group he belongs to received research grants from the industry that are deposited into research accounts. Institutions or organizations that he is affiliated with likely receive funding from for-profit sponsors that are supporting infrastructure and research that may serve his work. He is documents editor for the American Thoracic Society and co-chair of the GRADE Working Group.

Content

Why grading Confidence in information and

recommendationsIntro to: Quality of evidence Strength of recommendations

Please discuss the difference between consensus statements and guidelines?

Be prepared to discuss your answer

13

There are no RCTs! Do you think that users of

recommendations would like to be informed about the basis (explanation) for a recommendation or coverage decision if they were asked (by their patients)?

I suspect the answer is “yes” If we need to provide the basis for

recommendations, we need to say whether the evidence is good or not so good; in other words perhaps “no RCTs” 14

Hierarchy of evidence

STUDY DESIGN Randomized Controlled

Trials Cohort Studies and

Case Control Studies Case Reports and Case

Series, Non-systematic observations

BIAS

Expert Opinion

Confidence in evidence

There always is evidence “When there is a question there is

evidence” Better research greater confidence in

the evidence and decisions

Who can explain the following? Concealment of randomization Bias, confounding and effect

modification Blinding (who is blinded in a double

blinded trial?) Intention to treat analysis and its correct

application Why trials stopped early for benefit

overestimate treatment effects? P-values and confidence intervals

Hierarchy of evidence

STUDY DESIGN Randomized Controlled

Trials Cohort Studies and

Case Control Studies Case Reports and Case

Series, Non-systematic observations

BIAS

Expert Opinion

Exp

ert O

pin

ion

Expert Opinion

Reasons for grading evidence?

Appraisal of evidence has become complex and daunting

People draw conclusions about the quality of evidence and strength of

recommendations

Systematic and explicit approaches can help protect against errors, resolve disagreements communicate information and fulfil needs

Change practitioner behavior However, wide variation in approaches

GRADE working group. BMJ. 2004 & 2008

Which grading system?

Evidence Recommendation B Class I A 1 IV C

Organization AHA ACCP SIGN

Recommendation for use of oral anticoagulation in patients with atrial fibrillation and rheumatic mitral valve disease

What to do?

22

Recommendations vs statements!

Limitations of older systems & approaches

confuse quality of evidence with strength of recommendations

Levels of evidence

Recommendations

Limitations of older systems & approaches

confuse quality of evidence with strength of recommendations

lack well-articulated conceptual framework

criteria not comprehensive or transparent focus on single outcomes

GRADE Quality of Evidence

In the context of a systematic review The quality of evidence reflects the

extent to which we are confident that an estimate of effect is correct.

In the context of making recommendations The quality of evidence reflects the

extent to which our confidence in an estimate of the effect is adequate to support a particular recommendation.

What makes you confident in health care decisions

28

Confident in the evidence?A meta-analysis of observational studies showed that bicycle helmets reduce the risk of head injuries in cyclists. OR: 0.31, 95%CI: 0.26 to 0.37

A meta-analysis of observational studies showed that warfarin prophylaxis reduces the risk of thromboembolism in patients with cardiac valve replacement. RR: 0.17, 95%CI: 0.13 to 0.24

29

30

31

GRADE: Quality of evidenceThe extent to which our confidence in an estimate of the treatment effect is adequate to support a particular recommendation.

GRADE defines 4 categories of quality: High Moderate Low Very low

I B II V III

Quality of evidence across studies

Outcome #1Outcome #2Outcome #3

Quality: HighQuality: ModerateQuality: Low

Determinants of quality

RCTs start high

Observational studies start low

34

What is the study design?

Determinants of quality

What lowers quality of evidence? 5 factors:

Methodological limitations

Inconsistency of results

Indirectness

of evidence

Imprecision of results

Publication bias

Assessment of detailed design and execution (risk of bias)For RCTs: Lack of allocation concealment No true intention to treat principle Inadequate blinding Loss to follow-up Early stopping for benefit

Methodological limitations

Inconsistency of results

Indirectness

of evidence

Imprecision of results

Publication bias

Schulz KF et al. JAMA 1995 37

Allocation concealment

250 RCTs out of 33 meta-analysesAllocation concealment:Effect

(Ratio of OR)

adequate 1.00 (Ref.)unclear 0.67 [0.60

– 0.75]not adequate 0.59

[0.48 – 0.73]

*

* significant

5 vs 4 chemo-Rx cycles for AML

Studies stopped early becasue of benefit

Jadad AR et al. Control Clin Trials 1996 40

What about scoring tools?

Example: Jadad score

Was the study described as randomized?1

Adequate description of randomization? 1Double blind? 1

Method of double blinding described? 1Description of withdrawals and dropouts?

1

Max 5 points for quality

Cochrane Risk of bias graph in RevMan 5

41

Look for explanation for inconsistency patients, intervention, comparator, outcome,

methods

Judgment variation in size of effect overlap in confidence intervals statistical significance of heterogeneity I2

Methodological limitations

Inconsistency of results

Indirectness

of evidence

Imprecision of results

Publication bias

43

HeterogeneityNeurological or vascular complications or death within 30 days of endovascular treatment (stent, balloon angioplasty) vs. surgical carotid endarterectomy (CEA)

Indirect comparisons Interested in head-to-head comparison Drug A versus drug B Tenofovir versus entecavir in hepatitis B

treatment

Differences in patients (early cirrhosis vs end-stage cirrhosis) interventions (CRC screening: flex. sig. vs

colonoscopy) comparator (e.g., differences in dose) outcomes (non-steroidal safety: ulcer on

endoscopy vs symptomatic ulcer complications)

Methodological limitations

Inconsistency of results

Indirectness

of evidence

Imprecision of results

Publication bias

Small sample size small number of events wide confidence intervals uncertainty about magnitude of effect

Methodological limitations

Inconsistency of results

Indirectness

of evidence

Imprecision of results

Publication bias

ImprecisionAny stroke (or death) within 30 days of endovascular treatment (stent, balloon angioplasty) vs. surgical carotid endarterectomy (CEA)

Reporting of studies publication bias

number of small studies

Methodological limitations

Inconsistency of results

Indirectness

of evidence

Imprecision of results

Publication bias

All phase II and III licensing trial for antidepressant drugs between 1987 and 2004 (74 trials – 23 were not published)

49

Quality assessment criteria

Lower if…Quality of evidence

High

Moderate

Low

Very low

Study limitations(design and execution)

Inconsistency

Indirectness

Imprecision

Publication bias

Observational study

Study design

Randomized trial

Higher if…

What can raise the quality of evidence?

50

51

Quality assessment criteria

Lower if… Higher if…Quality of evidence

High

Moderate

Low

Very low

Study design

Randomized trial

Observational study

Study limitations

Inconsistency

Indirectness

Imprecision

Publication bias

Large effect (e.g., RR 0.5)Very large effect (e.g., RR 0.2)

Evidence of dose-response gradient

All plausible confounding would reduce a demonstrated effect

52

Conceptualizing quality

Further research is very unlikely to change our confidence in the estimate of effectHigh

LowFurther research is very likely to have an important impact on our confidence in the estimate of effect and is likely to change the estimate

ModerateFurther research is likely to have an important impact on our confidence in the estimate of effect and may change the estimate

Very low Any estimate of effect is very uncertain

PICO

Clinica

l questi

on

Rate

importa

nce

Panel

Select

outcomes

Very low

Low

Modera

te

High

Formulate recommendations:

• For or against (direction)• Strong or weak (strength)

By considering: Quality of evidence Balance

benefits/harms Values and

preferences

Revise if necessary by considering:

Resource use (cost)

Quality

rating

outcomes

across

studies

OutcomeOutcomeOutcome

Outcome

Critical

Important

Critical

Not important

Gra

de

dow

n o

r up

Outcome

Important

Overa

ll q

ualit

y o

f evid

ence

GRADE evidence profile

GRADE - FROM EVIDENCE TO DECISONS

55

Strength of recommendations

Desirable effects• health benefits• less burden• savings

Undesirable effects• harms• more burden• costs

Developing recommendations

Strength of recommendation

“The strength of a recommendation reflects the extent to which we can, across the range of patients for whom the recommendations are intended, be confident that desirable effects of a management strategy outweigh undesirable effects.”

Strong or weak/conditional

Quality of evidence & strength of recommendation GRADE separates quality of evidence

from strength of recommendation Linked but no automatism Other factors beyond the quality of

evidence influence our confidence that adherence to a recommendation causes more benefit than harm

What makes Guidelines Evidence-Based in 2009?

Standardized Reporting of Clinical Practice Guidelines: A Proposal from the Conference on Guideline StandardizationChecklist for reporting: 18 items

Ann Intern Med. 2003

14. Recommendations and rationale - state the recommended action precisely. Indicate the quality of evidence and the recommendation strength.

What makes Guidelines Evidence-Based in 2009?

Standardized Reporting of Clinical Practice Guidelines: A Proposal from the Conference on Guideline StandardizationChecklist for reporting: 18 items

Ann Intern Med. 2003

16. Patient preferences - describe the role of patient preferences when a recommendation involves a substantial element of personal choice or values.

A COPD guideline – do you want your review used like this?

Another COPD guideline

And another COPD guideline

What to do?

Current state of recommendations

66

Current state of recommendations

Reviewed 7527 recommendations 1275 randomly selected

Inconsistency across/within 31.6% did not recommendations

clearly Most of them not written as executable

actions 52.7% did not indicated strength

67

Yale Guideline Corpus

1. Identify the critical recommendations in guideline text using semantic indicators

2. Use consistent semantic and formatting indicators throughout the publication

3. Group recommendations together in a summary section

4. Do not use assertions of fact as recommendations.

5. Clearly and consistently assign evidence quality and recommendation strength in proximity distinguish between the distinct concepts of

quality of evidence and strength of recommendation.

68

Challenges in wording recommendations

Need to express (two) levels Need to express direction Differences across languages

Need codes (letters, symbols, numbers)

Wording 1 Wording 2 Wording 3 Strong recommendation for We recommend… Clinicians should… We recommend…

Weak recommendation for We suggest Clinicians might… We conditionally

recommend… Weak recommendation against

We suggest...not Clinicians might not…

We conditionally recommend...not

Strong recommendation against

We recommend …not

Clinicians should not…

We recommend …not

70

Categories of recommendations

Although the degree of confidence is a continuum, we suggest using two categories: strong and weak/conditional.

Strong recommendation: the panel is confident that the desirable effects of adherence to a recommendation outweigh the undesirable effects.

Weak recommendation: the panel concludes that the desirable effects of adherence to a recommendation probably outweigh the undesirable effects, but is not confident.

Recommend

Suggest? ?

Implications of a strong recommendation Patients: Most people in your

situation would want the recommended course of action and only a small proportion would not

Clinicians: Most patients should receive the recommended course of action

Policy makers: The recommendation can be adapted as a policy in most situations

Implications of a weak/conditional recommendation Patients: The majority of people in

your situation would want the recommended course of action, but many would not

Clinicians: Be prepared to help patients to make a decision that is consistent with their own values

Policy makers: There is a need for substantial debate and involvement of stakeholders

Case scenario

A 13 year old girl who lives in rural Indonesia presented with flu symptoms and developed severe respiratory distress over the course of the last 2 days. She required intubation. The history reveals that she shares her living quarters with her parents and her three siblings. At night the family’s chicken stock shares this room too and several chicken had died unexpectedly a few days before the girl fell sick.

Interventions: antivirals, such as neuraminidase inhibitors oseltamivir and zanamivir

Relevant healthcare question?Clinical question:

Population: Avian Flu/influenza A (H5N1) patients

Intervention: Oseltamivir (or Zanamivir)

Comparison: No pharmacological intervention

Outcomes: Mortality, hospitalizations, resource use, adverse

outcomes, antimicrobial resistanceWHO Avian Influenza GL. Schunemann et al., The Lancet ID, 2007

How would you make decisions?

76

Judgements about the strength of a recommendation No precise threshold for going from a strong to a

weak recommendation The presence of important concerns about one or

more of these factors make a weak recommendation more likely.

Panels should consider all of these factors and make the reasons for their judgements explicit.

Recommendations should specify the perspective that is taken (e.g. individual patient, health system) and which outcomes were considered (including which, if any costs).

Evidence Profile

No of studies(Ref)

Design Limitations Consistency DirectnessOther

considerationsOseltamivir Placebo

Relative(95% CI )

Absolute(95% CI )

Mortality

0 - - - - - - - - - 9

5(TJ 06)

Randomised trial

No limitations One trial only Major uncertainty

(-2)1

Imprecise or sparse data (-1)

- - OR 0.22(0.02 to 2.16)

- Very low

6

0 - - - - - - - - - - 7

5(TJ 06)

Randomised trial

No limitations One trial only Major uncertainty

(-2)1

Imprecise or

sparse data (-1)2

2/982(0.2%)

9/662(1.4%)

RR 0.149(0.03 to 0.69)

- Very low

8

53

(TJ 06)(DT 03)

Randomised trials

No limitations4 Important inconsistency

(-1)5

Major uncertainty

(-2)1

- - - HR 1.303

(1.13 to 1.50)

- Very low

5

26

(TJ 06)

Randomised trials

No limitations -7 Major uncertainty

(-2)1

None - - - WMD -0.738

(-0.99 to -0.47)

Low

4

0 - - - - - - - - - - 4

0 - - - - - - - - - - 7

09 - - - - - - - - - - 7

311

(TJ 06)

Randomised trials

No limitations -12 Some uncertainty

(-1)13

Imprecise or

sparse data (-1)14

- - OR range15

(0.56 to 1.80)

- Low

0 - - - - - - - - - - 4

I mportance

Summary of findings

Cost of drugs

Outbreak control

Resistance

Serious adverse effects (Mention of significant or serious adverse effects)

Minor adverse effects 10 (number and seriousness of adverse effects)

Viral shedding (Mean nasal titre of excreted virus at 24h)

Duration of disease (Time to alleviation of symptoms/median time to resolution of symptoms – influenza cases only)

Duration of hospitalization

LRTI (Pneumonia - influenza cases only)

Healthy adults:

Hospitalisation (Hospitalisations from influenza – influenza cases only)

Quality assessmentNo of patients Effect

Quality

Oseltamivir for treatment of H5N1 infection:

-

-

Oseltamivir for Girl with Avian FluSummary of findings: No clinical trial of oseltamivir for treatment

of H5N1 patients. 4 systematic reviews and health technology

assessments (HTA) reporting on 5 studies of oseltamivir in seasonal influenza. Hospitalization: OR 0.22 (0.02 – 2.16) Pneumonia: OR 0.15 (0.03 - 0.69)

3 published case series. Many in vitro and animal studies. No alternative that is more promising at

present. Cost: ~ $45 per treatment course

What are the factors that determine your decisions?

80

GRADE: Factors influencing decisions and recommendations

Quality of Evidence Balance of desirable and undesirable

consequences Values and preferences Cost

81

Determinants of the strength of recommendation

Factors that can strengthen a recommendation

Comment

Quality of the evidence The higher the quality of evidence, the more likely is a strong recommendation.

Balance between desirable and undesirable effects

The larger the difference between the desirable and undesirable consequences, the more likely a strong recommendation warranted. The smaller the net benefit and the lower certainty for that benefit, the more likely weak recommendation warranted.

Values and preferences The greater the variability in values and preferences, or uncertainty in values and preferences, the more likely weak recommendation warranted.

Costs (resource allocation) The higher the costs of an intervention – that is, the more resources consumed – the less likely is a strong recommendation warranted

Determinants of the strength of recommendation

Factors that can weaken the strength of a recommendation. Example:

Decision Explanation

Lower quality evidence □ Yes□ No

Uncertainty about the balance of benefits versus harms and burdens

□ Yes□ No

Uncertainty or differences in values □ Yes□ No

Uncertainty about whether the net benefits are worth the costs

□ Yes□ No

Table. Decisions about the strength of a recommendationFrequent “yes” answers will increase the likelihood of a weak recommendation

Oseltamivir – Avian Influenza

Factors that can weaken the strength of a recommendation. Example: treatment of H5N1 patients with oseltamivir

Decision Explanation

Lower quality evidence

Yes □ No

The quality of evidence is very low

Uncertainty about the balance of benefits versus harms and burdens

Yes □ No

The benefits are uncertain because several important or critical outcomes where not measured. However, the potential benefit is very large despite potentially small relative risk reductions.

Uncertainty or differences in values □ Yes No

All patients and care providers would accept treatment for H5N1 disease

Uncertainty about whether the net benefits are worth the costs

□ Yes No

For treatment of sporadic patients the price is not high ($45).

Frequent “yes” answers will increase the likelihood of a weak recommendation

Example: Oseltamivir for Avian Flu

Recommendation: In patients with confirmed or strongly suspected infection with avian influenza A (H5N1) virus, clinicians should administer oseltamivir treatment as soon as possible (????? recommendation, very low quality evidence).

Schunemann et al. The Lancet ID, 2007

Are values important?Should resources be considered?

86

Example: Oseltamivir for Avian Flu

Recommendation: In patients with confirmed or strongly suspected infection with avian influenza A (H5N1) virus, clinicians should administer oseltamivir treatment as soon as possible (strong recommendation, very low quality evidence). Values and PreferencesRemarks: This recommendation places a high value on the prevention of death in an illness with a high case fatality. It places relatively low values on adverse reactions, the development of resistance and costs of treatment. Schunemann et al. The Lancet ID, 2007

Other explanations

Remarks: Despite the lack of controlled treatment data for H5N1, this is a strong recommendation, in part, because there is a lack of known effective alternative pharmacological interventions at this time.

The panel voted on whether this recommendation should be strong or weak and there was one abstention and one dissenting vote.

Systematic review

Guideline development

PICO

OutcomeOutcomeOutcomeOutcome

Formulate

question

Rate

importa

nce

Critical

Important

Critical

Not important

Create

evidence

profile with

GRADEpro

Summary of findings & estimate of effect for each outcome

Rate overall quality of

evidence across outcomes based

on lowest quality of critical outcomes

Panel

RCT start high, obs. data start

low1. Risk of bias2. Inconsisten

cy3. Indirectnes

s4. Imprecision5. Publication

bias

Gra

de

dow

nG

rad

e

up

1. Large effect

2. Dose response

3. Confounders

Rate quality

of evidence

for each

outcomeSelect

outcomes

Very low

LowModerate

High

Formulate recommendations:

• For or against (direction)• Strong or weak (strength)

By considering: Quality of evidence Balance

benefits/harms Values and

preferences

Revise if necessary by considering:

Resource use (cost)

• “We recommend using…”• “We suggest using…”• “We recommend against using…”• “We suggest against using…”

Outcomes

across

studies

90