draft guidelines for who and sage ... draft guidelines for who and sage development of...

41
1 DRAFT GUIDELINES FOR WHO AND SAGE DEVELOPMENT OF EVIDENCE- BASED VACCINE RELATED RECOMMENDATIONS Table of Contents 1. Introduction .................................................................................................................. 2 1.1 Background ........................................................................................................2 1.2 Past Use of GRADE in Vaccine Position Papers...............................................4 2. SAGE Process for Reviewing the Evidence ............................................................... 4 2.1 Definition of questions to inform recommendations .........................................5 2.2 Identification of critical questions for which the GRADE approach should be applied ......................................................................................................................7 2.3 Systematic review of the literature and of unpublished data .............................7 2.4 Identifying study limitations through risk of bias ..............................................8 2.4.1 Risk of bias in RCTs ......................................................................................8 2.4.2 Risk of bias in observational studies..............................................................9 2.4.3 Impact of bias..................................................................................................9 2.4.4 Quality of systematic reviews and meta-analyses .......................................10 2.5 Scoring of the Quality of Evidence.................................................................10 2.6 Discussion and deliberation leading to the development of proposed recommendations ...................................................................................................10 2.7 Presentation of proposed recommendations to SAGE along with the supporting evidence .................................................................................................................10 2.8 SAGE discussion, deliberation, and ultimate decision regarding the proposed recommendations to WHO ....................................................................................11 3. Scoring of the Quality of Evidence ........................................................................... 11 3.1 Categorization of studies..................................................................................12 3.2 GRADE quality assessment criteria................................................................12 3.3 Quality of evidence rating................................................................................14 3.4 Application of GRADE to Recommendations.................................................14 3.5 Presentation of GRADE Tables .......................................................................15 4. Vaccine Recommendation Development - Beyond Scoring the Evidence............. 15 4.1 Other considerations when making recommendations ...................................15 4.2 Updating Recommendations ...........................................................................16 4.3 Emergency situations ......................................................................................16 5. Conclusions ................................................................................................................. 17 Appendix 1. Draft Data Extraction tool ........................................................................... 18 Appendix 2. Checklists for Reviewing Study Quality ..................................................... 20 Appendix 2a. Checklist for RCTs ..........................................................................21 Appendix 2b. Checklist for Case-Control Studies .................................................25 Appendix 2c. Checklist for Cohort Studies ...........................................................30 Appendix 2d. Checklist for Systematic Reviews...................................................35 Appendix 3. Draft Summary Table for Evidence Review .............................................. 39 Appendix 4. Rating the Quality of the Evidence ............................................................. 40 Appendix 5. Template of a GRADE table used to score the quality of evidence. ........ 41

Upload: truongkhanh

Post on 22-Jun-2018

225 views

Category:

Documents


0 download

TRANSCRIPT

1

DRAFT GUIDELINES FOR WHO AND SAGE DEVELOPMENT OF EVIDENCE-BASED VACCINE RELATED RECOMMENDATIONS

Table of Contents

1. Introduction .................................................................................................................. 2

1.1 Background........................................................................................................2 1.2 Past Use of GRADE in Vaccine Position Papers...............................................4

2. SAGE Process for Reviewing the Evidence ............................................................... 4 2.1 Definition of questions to inform recommendations .........................................5 2.2 Identification of critical questions for which the GRADE approach should be applied......................................................................................................................7 2.3 Systematic review of the literature and of unpublished data .............................7 2.4 Identifying study limitations through risk of bias..............................................8 2.4.1 Risk of bias in RCTs......................................................................................8 2.4.2 Risk of bias in observational studies..............................................................9 2.4.3 Impact of bias..................................................................................................9 2.4.4 Quality of systematic reviews and meta-analyses .......................................10 2.5 Scoring of the Quality of Evidence.................................................................10 2.6 Discussion and deliberation leading to the development of proposed recommendations ...................................................................................................10 2.7 Presentation of proposed recommendations to SAGE along with the supporting evidence .................................................................................................................10 2.8 SAGE discussion, deliberation, and ultimate decision regarding the proposed recommendations to WHO ....................................................................................11

3. Scoring of the Quality of Evidence ........................................................................... 11 3.1 Categorization of studies..................................................................................12 3.2 GRADE quality assessment criteria................................................................12 3.3 Quality of evidence rating................................................................................14 3.4 Application of GRADE to Recommendations.................................................14 3.5 Presentation of GRADE Tables .......................................................................15

4. Vaccine Recommendation Development - Beyond Scoring the Evidence............. 15 4.1 Other considerations when making recommendations ...................................15 4.2 Updating Recommendations...........................................................................16 4.3 Emergency situations ......................................................................................16

5. Conclusions ................................................................................................................. 17 Appendix 1. Draft Data Extraction tool........................................................................... 18 Appendix 2. Checklists for Reviewing Study Quality..................................................... 20

Appendix 2a. Checklist for RCTs..........................................................................21 Appendix 2b. Checklist for Case-Control Studies.................................................25 Appendix 2c. Checklist for Cohort Studies ...........................................................30 Appendix 2d. Checklist for Systematic Reviews...................................................35

Appendix 3. Draft Summary Table for Evidence Review.............................................. 39 Appendix 4. Rating the Quality of the Evidence ............................................................. 40 Appendix 5. Template of a GRADE table used to score the quality of evidence. ........ 41

2

DRAFT GUIDELINES FOR WHO AND SAGE DEVELOPMENT OF EVIDENCE-BASED VACCINE RELATED RECOMMENDATIONS

1. Introduction Vaccines are one of the most successful public health interventions of all time. Millions of lives have been saved and disability averted due to the advent of critical vaccines. Much work is devoted to the development and testing of vaccines, ultimately leading to their licensure and use in a population. However, availability of the products does not ensure their appropriate use. The World Health Organization (WHO) is tasked to provide leadership in global health, shape research agendas, provide guidance and standards for public health practice, and provide support to country programmes.1 To fulfil its mission for vaccines, since 1998 the WHO has published vaccine position papers2 with global recommendations for vaccine use. Each position paper is specific to a vaccine preventable disease and includes four sections: an introduction, a section providing information on the respective disease (disease epidemiology, the pathogen, the disease), a section providing information on the available vaccines (composition, safety, immune response, efficacy and effectiveness, cost effectiveness and any other relevant issue), and the WHO position on the optimal vaccine use. The Strategic Group of Advisory Experts on immunization (SAGE) is an independent advisory committee tasked to advise the WHO on the development of policy related to vaccines and immunization.3 4 SAGE makes recommendations to the WHO on vaccine-relevant topics identified as priorities of public health importance. These recommendations are captured in the SAGE meeting reports and published in the Weekly Epidemiological Record following each meeting. All reports, meeting presentations, and background documents are available online.5 Since 2006, SAGE has been charged with reviewing WHO vaccine position papers. Working Groups of SAGE review the evidence relating to issues addressed in the vaccine position papers and propose recommendations for SAGE to consider. After discussion and deliberation by SAGE, SAGE makes recommendations on the use of vaccines that are incorporated by WHO into the vaccine position papers.

1.1 Background A careful review and consideration of the scientific evidence is a necessary step in recommendation and guideline development. The results of the full range of studies on a given topic should be carefully considered to identify trends in magnitude, geographic variability, and other factors that are important for impact and generalizability. For developing the most appropriate recommendations, committees should weigh the desirable and undesirable consequences based on the best available evidence and take into account societal values and preferences. While the evidence reviewed is the result of scientific

1 http://www.who.int/about/role/en/index.html 2 Available at http://www.who.int/immunization/documents/positionpapers/en/index.html 3 SAGE Terms of Reference: http://www.who.int/immunization/sage/SAGE_TOR_1_September_2010.pdf 4 Duclos P, Okwo-Bele JM, Salisbury D. Establishing global policy recommendations and achieving global goals: the Strategic Advisory Group of Experts on immunization. Expert Review of Vaccines, February 2011;10(2):163-173. 5 http://www.who.int/immunization/sage/previous/en/index.html

3

endeavours, evaluating the quality of the evidence and making recommendations are activities that require expert interpretation and judgement in addition to rigorous scientific review. Factors that are taken into consideration include disease epidemiology and clinical characteristics, vaccine and immunization characteristics, economic considerations, health system opportunities, and interaction with other existing intervention and control strategies. In addition to the results of studies themselves, consideration is given to the methodology and study design used to conduct such studies. It is generally accepted that randomized controlled trials (RCTs) are the gold standard to minimize various forms of bias when looking for associations between interventions and health outcomes, but there are many characteristics of RCTs or observational studies that determine their quality and relevance. In some cases, faulty randomization or blinding may reduce the quality of an RCT below that of a well-designed observational study. Therefore, a review of the potential risks for bias and other aspects of study design quality is crucial when drawing conclusions from a study of any type. The quality of evidence reflects the extent to which confidence in the estimation of effect is adequate to support a particular decision or recommendation. The Grading of Recommendations Assessment, Development and Evaluation (GRADE)6 approach is one of many frameworks developed over the years to assess the quality of evidence and it has been adopted by the WHO. The use of the GRADE methodology to score the quality of evidence in support of key recommendations included in the WHO vaccine position papers was introduced in April 2007.7 In addition to recommendations for vaccine usage SAGE also makes strategic recommendations regarding public health programmes and research priorities, which do not undergo formal GRADE scoring. However SAGE recommendations are evidence-based and follow the overall framework of GRADE. Available data to support many critical policy decisions are assessed using the other steps of GRADE without formal scoring. All critical recommendations for interventions are scored using the GRADE framework to assess the quality of related evidence. The formal GRADE process has been described elsewhere.6 In short, questions of importance related to a recommendation are identified, a systematic literature review is conducted to identify what is known to answer the question(s), and the quality of relevant evidence is reviewed and scored. Five criteria (limitations in study design commensurate with the type of study, inconsistency, indirectness, imprecision, and publication bias) are used to downgrade the quality of evidence when studies do not meet the published standards, and three criteria (magnitude of the effect, dose-response gradient, and ability of the study to limit biases and control for confounding) are used to upgrade the quality of evidence when study results increase one's confidence in their validity. Based on this score, as well as other factors (balance between benefits and risks, societal values and preferences, and cost and resources), recommendations are made and scored as strong or weak. The GRADE framework is an attempt to provide structure and guidance for objectively reviewing the quality of evidence and risk of bias. Nevertheless, some decisions to up- and downgrade the evidence may be a matter of individual judgement. A hallmark of GRADE is

6Guyatt GH, Oxman AD, Vist G, Kunz R, Falck-Ytter Y, Alonso-Coello P, Schünemann HJ, for the GRADE Working Group. Rating quality of evidence and strength of recommendations GRADE: an emerging consensus on rating quality of evidence and strength of recommendations. BMJ 2008;336:924-926 7Weekly Epidemiological Record; 25 May 2007, vol. 82, 21 (pp 181–196). Available at http://www.who.int/entity/wer/2007/wer8221.pdf

4

its aim to improve transparency in decision making. Although GRADE remains subject to some individual interpretation, interested parties are able to follow the logic and processes that led to a given conclusion, recommendation, and/or guideline. Such a process also promotes useful dialogue and opportunities to reassess the evidence as needed. The GRADE framework and particularly the scoring process has undergone and will continue to undergo improvements over time, based on the collaborative work of the open ended GRADE working group.

1.2 Past Use of GRADE in WHO Vaccine Position Papers Since 2007, GRADE tables have accompanied WHO vaccine position papers and were made available as attachments online.8 GRADE tables for vaccine position papers attempted to apply the GRADE framework strictly, although some GRADE evidence profiles and summary of findings tables9 were adjusted to the specific needs of vaccines and provided additional information in footnotes and narrative text as thought necessary. GRADE tables are only applied to issues regarding the effectiveness and safety of vaccines and are created for overall vaccine efficacy/effectiveness and safety, and occasionally other more specific considerations of the effectiveness/safety of the intervention (such as the duration of protection, schedule considerations, and use in subpopulations, such as specific age or risk groups or HIV-infected populations). Over the past few years, SAGE members expressed concern about the use of the GRADE scoring methodology and specifically how it was applied to vaccines, as relevant and important data were sometimes excluded or given a low quality score despite providing convincing relevant evidence. When strictly applied, the GRADE scoring at times ranked the quality of evidence as low or moderate, which SAGE did not feel appropriately reflected the quality of the evidence base. This was particularly true for traditional vaccines for which, despite many years of successful field use and impact demonstrated through many observational studies or population impact demonstrated by rigorous surveillance, the evidence quality level could not be upgraded. At the end, the score was still that of low quality evidence Not only did these rankings present a problem for communicating the basis for a recommendation to use a vaccine, but there was concern that these low rankings could be misunderstood by the general public or misused by those promoting an anti-vaccine agenda. Thus, instructions on how to apply the GRADE framework, including minor adjustments, were proposed based on SAGE members’ review of the GRADE methodology and many years of experience working with vaccines and assessing the quality of data to inform public health policy. In addition discussions were held and following discussions with other national technical advisory groups and a review of specific examples was conducted with the GRADE working group, which resulted in some adjustments to the GRADE scoring scheme itself. The following provides a framework that better fits the needs of vaccine evidence and integrates the GRADE approach with SAGE and WHO recommendation development processes.

2. SAGE Process for Reviewing the Evidence

8 http://www.who.int/immunization/documents/positionpapers/en/index.html 9 Guyatt GH, Oxman AD, Aklm EA et al. GRADE guidelines: 1. Introduction GRADE evidence profiles and summary of findings tables. Journal of Clinical Epidemiology 64 (2011) 383-394

5

Complex issues are routinely examined in careful detail by SAGE Working Groups.10 Working Groups (WGs) review the evidence pertaining to a given topic and present proposals for recommendations to SAGE, which then discusses, deliberates, and ultimately provides its recommendations to WHO. Thus, the initial review of the evidence occurs in WGs. The key activities involved in creating evidence-based SAGE recommendations are as follows:

1. Definition of the questions to inform recommendations 2. Identification of the critical questions for which an in-depth review is needed 3. Systematic review of the literature with or without meta-analysis and, where

necessary, implementation of research to address gaps in the evidence 4. Review the quality of the evidence in particular through assessment of the risk of bias

and confounding 5. Scoring of the quality of the evidence (using the GRADE approach for data on safety

and effectiveness) 6. Discussion and deliberation, leading to the development of proposed

recommendations 7. Presentation of proposed recommendations to SAGE, along with the evidence used to

support the recommendations 8. SAGE discussion, deliberation, and decision regarding the proposed

recommendations to WHO Each of these steps is discussed in the sections that follow. The guiding principles are that careful review and consideration of the evidence should precede development of recommendations and the entire process should be transparent.

2.1 Definition of questions to inform recommendations An essential part of the recommendation development process is defining the information that will influence the making of a recommendation. There are many important factors to consider; in the case of vaccines, these may include the burden of the disease, the effectiveness and safety of a vaccine, and the optimal schedule for protection given programmatic realities. All of these may need to be considered in the general population, in different geographic regions, and in various subpopulations. A well-accepted methodology associated with framing of questions addressing alternative management strategies in systematic reviews mandates carefully specifying the patient population, the intervention of interest, the comparator, and the outcomes of interest. The value of the methodology popularly known as PICO (patient/intervention/comparator/outcome) in helping achieve focused recommendations is increasingly recognized and proposed by GRADE11. Outcomes of interest should be those important to patients: if patient-important outcomes are represented by a surrogate, they will frequently require down-rating of the quality of evidence for indirectness. For a guideline, an initial rating of the importance of outcomes should precede the review of the evidence. One should specify all potential patient-important outcomes and make a preliminary classification of outcomes into those that are critical, those that are important but not critical, and those of limited importance. The first two classes of evidence will have bearing on guideline recommendations; the third may or may not. Since

10 http://www.who.int/immunization/sage/working_groups/en/index.html 11 Guyatt GH, Oxman AD, Kunz R. et al. GRADE guidelines: 2. Framing the question and deciding on important outcomes. Journal of Clinical Epidemiology 64 (2011) 395-400

6

GRADE decisions regarding the overall quality of evidence supporting a recommendation may depend on which outcomes are designated as critical for making the decision and which are not, it is important to define those critical outcomes. For pragmatic reason, scoring proposed in step (5) above will only be applied to those critical outcomes. At the beginning of its work, a WG should come to consensus on the key questions for consideration, so that a detailed literature review may be conducted. The questions of efficacy/effectiveness, safety, and burden of disease are generally key questions for the WG to factor into the development of proposed recommendations/options for vaccine use. Issues for the SAGE WG to consider when developing proposed recommendations and for SAGE when making recommendations, include the following:

• Epidemiologic features of the disease o Disease burden, including age specific mortality, morbidity, and societal

impact o Specific risk groups o Epidemic potential o Disease occurrence over time (i.e. secular trends) o Serogroup or serotype distribution for serogroup or serotype specific vaccines o Changes in epidemiology over time

• Clinical characteristics o Clinical management of disease o Disease severity and fatality o Primary/secondary/tertiary care implications o Long term complications of disease and medical requirements

• Vaccine and immunization characteristics o Efficacy o Effectiveness and population impact of the vaccine (including herd immunity) o Safety o Indirect effects (potential impact on strain selection, herd immunity, potential

safety concerns of live attenuated vaccines in contact of vaccines) o Cold chain and logistical concerns o Vaccine availability o Vaccine schedules o Schedules’ social and programmatic acceptability o Ability to reach the target populations o Ability to monitor program impact

• Economic considerations o Cost of illness o Vaccine and vaccine delivery costs o Potential for vaccine price reductions o Vaccine cost and cost-effectiveness of immunization programmes o Affordability of immunization

• Health system opportunities and existence of and interaction with other existing intervention and control strategies

• Social impacts • Legal considerations

Ethical considerations

7

2.2 Identification of critical questions to which the GRADE approach should be applied Because many factors are considered in making recommendations, WGs will often identify many questions in the categories in section 2.1 for which answers will be sought. However, it is clearly impossible to dissect all issues to a granular level, and questions must be prioritized by the WG. Questions or outcomes that are particularly contentious and critical for decisions to implement an intervention are prioritized for GRADing of the quality of evidence. The GRADE scoring needs to be applied only to the critical questions, preferably no more than five questions unless there are unusual circumstances. The formal scoring is appropriate only for questions regarding an intervention (e.g. vaccine use), not for disease burden, economic considerations or strategic recommendations (e.g. research gaps, decision to pursue an eradication goal, etc.). However, an in-depth look at the evidence will be conducted even for those questions that are not formally GRADEd and a systematic literature search needs to be performed. For assessing the quality of economic and cost-effectiveness evaluations, other guidelines (e.g. WHO guide for standardization of economic evaluations of immunization programmes http://whqlibdoc.who.int/hq/2008/WHO_IVB_08.14_eng.pdf) can be used. In evidence-based recommendations, the steps listed above (steps 1-8) will always be conducted, with the exception of the formal scoring (5). It is the role of the WG to help identify the critical questions to be scored using GRADE.

2.3 Systematic review of the literature and of unpublished data An essential step in the process is to conduct a careful literature review for data relevant to the questions at hand. The review of the literature should be presented to the WG for consideration and to ensure its completeness. Efforts should also be made to identify any unpublished but relevant data that would inform WG and SAGE deliberations. Literature searches should be carefully documented, transparent, and reproducible. Some of the literature reviews may be done in advance of a WG defining the questions, (e.g. for safety and efficacy, which will likely always be considered critical). A list of relevant papers should be provided to the WG. The data should be extracted using a data extraction tool (e.g. Appendix 1) and consolidated to facilitate review by the WG. Data may be combined into a meta-analysis. Literature searches are also important for identifying knowledge gaps and helping prioritize future research agendas. Those important areas where data are lacking should be highlighted by the WG and SAGE to encourage additional studies. In some instances a final decision regarding a recommendation to use a vaccine will not be taken until the critical missing data are made available. In rare instances, recommendations may be needed for interventions for which there is very limited evidence base. For example, although a few case studies have reported infection with yellow fever vaccine virus in infants born to vaccinated mothers, data needed to evaluate this risk or association are not available. In these circumstances, what little evidence is available and results from related but indirect studies (e.g. studies evaluating other live vaccines given to pregnant women) carefully considered by the key experts may be the foundation for a

8

recommendation. When only very low or low quality of evidence is available but a recommendation must be formulated, a clear explanation should be provided.12 Data considered by SAGE and WHO may be published or unpublished. While RCTs are considered the gold standard for assessing the effect of an intervention, for vaccine effectiveness and safety important sources of data and constitute a significant component of the body of evidence use for making recommendations. Types of available studies include randomized-controlled trials, observational studies, outbreak investigations, country surveillance, program evaluations, cost-effective analyses, forecasting, and landscape analyses.

2.4 Identifying study limitations related to bias As important studies are identified they should be documented in a summary table (e.g. Appendix 3). This will allow for easier comparison and evaluation of studies for scoring. There are a number of characteristics that may put studies at risk of bias (that is, systematic errors or differences from the true results) that could affect internal validity that need to be considered when reviewing the quality of the evidence. These specific characteristics depend on the type of study and the outcomes evaluated. Both the Cochrane Collaboration and the Critical Appraisals Skills Programme have developed useful tools for evaluating study quality. Standardized approaches to evaluating the quality of non-randomized trials are less well developed, although some guidance is available as follows. The tools listed below are adapted from the Cochrane Handbook13 and Critical Appraisals Skills Programme.14 As noted in the Cochrane handbook, there are other important aspects of study quality (e.g. reporting quality and ethical approval) which are not addressed in this section. Rather, the primary focus is on the risk of bias that could affect the interpretation of the results. For reviewers of studies of vaccines, the draft data extraction tool (Appendix 1) permits consideration of these factors and may be used to evaluate the limitations of individual study. Appendix 2 provides four checklists developed by the Critical Appraisals Skills Programme that may be used and adapted to assess study methods and potential limitations of vaccine studies.

2.4.1 Risk of bias in RCTs When properly conduced and of adequate size, RCTs have the lowest risk for bias. The Cochrane Collaboration highlights six characteristics to consider concerning the risk of bias in RCTs especially confounding these:

• Sequence generation refers to the method of randomly allocating an intervention to study participants

12 For example, in the 2007 Vaccine Position Paper on Rotavirus vaccine, WHO states "…until the full potential of the current rotavirus vaccines has been confirmed in all regions of the world, in particular in Asia and Africa, WHO is not prepared to recommend global inclusion of rotavirus vaccines into national immunization programmes." WHO later amended the recommendation once data were available supporting widespread use. This recommendation was also influenced by consideration of factors other than the quality of the evidence (see section 4.1) 13 Available at http://www.cochrane-handbook.org/. In particular, see chapters 8 and 13. 14 http://www.sph.nhs.uk/what-we-do/public-health-workforce/resources/critical-appraisals-skills-programme

9

• Allocation sequence concealment refers to the prevention of knowledge (or prediction) of intervention assignment by study participants and investigators

• Blinding refers to the masking of to the intervention assigned study participants and investigators

• Incomplete outcome data may be the result of participant drop out (missing data) or exclusion of data from the study results

• Selective reporting (i.e. reporting bias) is the incomplete publication of results based on their results

• Other sources of the reporting bias may include design-specific risks of bias, early stopping, baseline imbalance, blocking of experimental units in unblinded studies, differential diagnostic activity, always others issues.

For more detail on each of these, see the Cochrane Handbook (Section 8).15 Each feature should be evaluated to determine the risk of bias in each study (using the data extraction tool and checklist) and then should be documented in the Summary Table for Evidence Review (Appendix 1).

2.4.2 Risk of bias in observational studies Observational studies are particularly susceptible to selection bias and confounding, because different types of observational studies carry different risks of bias, it is more challenging to standardize the evaluation of bias across study types. Two checklists have been included in Appendix 2 for reviewing the quality and risks of bias in case control and cohort studies. They can be modified for other study designs. For interventions at the individual level, the Cochrane Collaboration16 suggests consideration be given to differences in the comparison groups or within participants over time; allocation to comparison groups and potential temporal, geographic, treatment, or other differences that could bias the results; and prospective and retrospective aspects of the studies. A clear description of potential confounders the direction, in which direction they would likely bias the results, and what the authors did to address confounding (e.g. matching, stratification, modeling, etc.) should also be clearly outlined. All of these features are included in the data extraction tool and checklists to aid reviewers' consideration of the risks of bias. The collective results should then be reflected in the GRADE scoring under the "Limitations" criterion (see section 3).

2.4.3 Impact of bias After carefully reviewing each study for potential biases, an overall assessment of the evidence for bias as well as the likely direction(s) and magnitude of the bias(es) should all be taken into account. If many of the studies that constitute the evidence base have a high risk of bias, any conclusions from the body of evidence must be drawn carefully. Studies at high risk for bias may be excluded if the results are deemed too unreliable to consider.

15 Higgins JPT, Altman DG (editors). Chapter 8: Assessing risk of bias in included studies. In: Higgins JPT, Green S (editors). Cochrane Handbook for Systematic Reviews of Interventions Version 5.0.1 (updated September 2008). The Cochrane Collaboration, 2008. Available from www.cochrane-handbook.org. 16 Reeves BC, Deeks JJ, Higgins JPT, Wells GA. Chapter 13: Including non-randomized studies. In: Higgins JPT, Green S (editors), Cochrane Handbook for Systematic Reviews of Interventions Version 5.0.2 (updated September 2009). The Cochrane Collaboration, 2009. Available from www.cochrane-handbook.org.

10

2.4.4 Quality of systematic reviews and meta-analyses Systematic reviews and meta-analyses can be useful tools for evaluating effects across studies. Their validity will depend on the completeness of their study search, their assessment of the quality of studies, the appropriateness of combining data across studies, and the relevance of the outcomes considered. In reviewing the quality of an existing systematic review, careful attention should be paid to the following: search methodology, heterogeneity, and inclusion/exclusion criteria (particularly for observational studies), in addition to the attributes discussed above for individual studies. If any of these are in question, the results of the systematic review should be viewed cautiously. Some reviews do not consider all of the data that may be relevant to an assessment of vaccine efficacy and safety (e.g. observational studies, outbreak investigations, surveillance reports, etc.). Appendix 2 provides a checklist to use when reviewing the quality of systematic reviews. In some cases, a systematic literature review may already have been done by WHO or another group (e.g. Cochrane Collaboration), independent of or on behalf of WHO. Previous reviews may serve as the basis for analyzing the evidence base, although a search should be conducted to ensure studies published since the previous review was published are not missed. 2.5 Scoring of the Quality of Evidence Please see Section 3.

2.6 Discussion and deliberation leading to the development of proposed recommendations WGs meet on a regular basis until they have completed all objectives in their Terms of Reference, which may take 6 to 12 months. WGs often meet 1-3 times in person and participate in frequent (often monthly) conference calls. During these meetings, WG members review the evidence (provided in the form of presentations from WHO, outside consultants, and/or WG members), highlight issues, and make proposals for recommendations. Draft documents (such as background papers, summaries of the evidence, etc.) and presentations to SAGE are discussed and vetted by the WG, and proposed recommendations are agreed upon by consensus. For additional information on WGs, please see Annex 3 of the SAGE Terms of Reference.17

2.7 Presentation of proposed recommendations to SAGE along with the supporting evidence WG chairs (who are also SAGE members) present their proposed recommendations to SAGE. SAGE receives updates throughout the WG progress. For each recommendation proposed by the WG, a written rationale with supporting evidence should be provided (for an example, see background material for pertussis vaccines18) along with the important considerations underlying each recommendation. The recommendation and rationale are provided in advance of the SAGE meeting. These elements are also summarized in a presentation given by the WG chair to SAGE.

17 http://www.who.int/immunization/sage/SAGE_TOR_1_September_2010.pdf 18http://www.who.int/immunization/sage/Summaries_of_evidence_in_support_of_proposed_recommendations_Final.pdf

11

The format for how data and their synthesis are provided and presented to the WG will depend upon the terms of reference for the WG. In addition to the point-by-point recommendations and justifications, additional background materials will often be appropriate. In general, when providing evidence in support of recommendations for a new vaccine, an in-depth background paper should be provided to SAGE. For many recommendations, an appropriate format for displaying the evidence may be using the major categories of disease epidemiology, clinical characteristics, vaccine and immunization characteristics, economic considerations, health system opportunities, and existence of and interaction with other existing intervention and control strategies (section 2.1). The amount of information presented and level of detail will depend on the topic at hand.

2.8 SAGE discussion, deliberation, and ultimate decision regarding the proposed recommendation to WHO Prior to the SAGE meeting, SAGE members will have received previous updates from the WG, meeting minutes from all teleconferences and in person meetings, and background materials important to the WG's deliberations. During the SAGE session on the topic at hand, SAGE members will discuss and deliberate upon the WG’s proposed recommendations in the open forum of a SAGE meeting. SAGE members may adopt the WG's proposed recommendations or make necessary adjustments. SAGE adopts recommendations by consensus; the recommendations are then transmitted to WHO to incorporate into a WHO vaccine position paper. More information on SAGE and its role in policy development is available in Duclos et al.

3. Scoring of the Quality of Evidence The GRADE approach19 and its application to SAGE vaccine recommendations are described below. SAGE has fully embraced the GRADE methodology with only minor adaptation to strengthen its relevance to immunization. Many of the adjustments to the more traditional presentation of the GRADE tables are an attempt to clarify its application to vaccines/vaccination recommendations without changing the intent. The adjustments ensure that the many types of data available for immunizations are reflected in the decision making process. Vaccine development and testing has occurred over many decades and many old vaccines are still used today. Therefore, the evidence base that is used to formulate recommendations often includes studies spanning a long time horizon, and as randomized controlled trials are unethical once the impact of protection is evident, many data stem from observational studies. When robust RCTs exist, the scoring of the evidence concerning efficacy need only include those RCTs. However, when observational studies are important in the body of evidence used to formulate recommendations in addition to RCTs, multiple tables must be constructed for each category of study and reviewed in totality. Throughout the evidence review process (steps 1-8), expert opinion is critical in the assessment of these factors and their importance to the question under consideration. The

19 Balshema H, Helfanda M, Schunemann HJ, et al. GRADE guidelines: 3. Rating the quality of evidence. Journal of Clinical Epidemiology 64 (2011) 401-406

12

application of the GRADE criteria and the inferences that may be drawn from the studies relating to the question under consideration are inherently subjective and rely on the judgement of skilled and experienced public health professionals. Active participation of the WGs is essential to ensure that the most appropriate studies are utilized and the results are carefully considered. In addition to formulating the questions for GRADE, the WGs will review the evidence and the resulting GRADE tables.

3.1 Categorization of studies Studies enter into the GRADE system at a particular level based initially on their study design. Because not all studies of a particular design are equal, the GRADE approach provides a framework to up- or down-grade the score of the evidence, based on methodological and quantitative assessment. To begin, however, all RCTs enter at level 4 (⊕⊕⊕⊕) and observational studies and surveillance data enter at level 2 (⊕⊕). The GRADE criteria should then be applied to the studies, although studies should not be repeatedly penalized for limitations already factored into their starting score. As an example, a controlled observational study that enters into the scoring system at a level 2 (⊕⊕) should not be further downgraded because it was not randomized. Passive surveillance data of uncertain quality should, however, likely be downgraded through application of some of the limiting factors. Only primary data sources should be entered into the table. Mathematical models do not represent primary data but build on other sources of information and therefore should not be reflected in the GRADE tables.

3.2 GRADE quality assessment criteria Each study should be reviewed using the following criteria, while recognizing that application of the criteria is a subjective process and open to individual interpretation. For example, how similar studies are to each other in their estimate of effect and whether any differences warrant a point reduction for inconsistency is likely subjective, although it is guided by review of point-estimates, confidence intervals, and values of i2 statistic of heterogeneity20. Furthermore, it may be very difficult to assess conclusively whether publication bias is occurring. This may require detailed information held by the specific study team or close attention to missing data that should have been collected during the study. WGs are particularly well-positioned to comment on this parameter through their expertise in the field. Documenting the process in an open and transparent manner will allow others to review the process and propose alternative interpretations for consideration. The following boxes outlines the criteria for down- and upgrading the strength of evidence (also see Appendix 4). The descriptions below are general and brief and provide specific instructions on how to apply GRADE in the area of vaccines and vaccination, and more detailed information may be found in GRADE documents. .

20 Huedo-Medina, Tania; Sanchez-Meca, Julio ; Marin-Martinez, Fulgencio ; and Botella, Juan, "Assessing heterogeneity in metaanalysis:Q statistic or I2 index? " (2006). CHIP Documents. Paper 19.http://digitalcommons.uconn.edu/chip_docs/19

13

Box 1. Criteria used to downgrade studies Limitations: Studies may be downgraded by 1 or 2 points for serious or very serious methodological limitations. Examples of these limitations include inappropriate randomization; lack of concealment; violation of the intention to treat principle; inadequate blinding; substantial loss to follow-up; and early stopping for benefit. (See section 2.4 for how to evaluate risks of bias due to methodological limitations) Inconsistency: Studies may be downgraded by 1 or 2 point if the effect is not similar across studies and if inconsistencies are serious or very serious Indirectness: Studies may be downgraded by 1 or 2 points if there are serious or very serious issues with indirectness. Examples of indirectness may include using surrogate end points, indirect comparisons between two treatments, problems with generalizability to the population of interest, and test inaccuracies. It is suggested that when assessing clinical protection, there is no downgrading for immunogenicity studies when there are well established standard correlates of protection Imprecision: Studies may be downgraded by 1 or 2 points if there is serious or very serious imprecision i.e. confidence intervals are wide or very wide Reporting Bias: Studies may be downgraded by 1 or 2 points if publication bias is likely or very likely Box 2. Criteria used to upgrade studies

Large effect/strength of association: Studies may be upgraded by 1 point if there is evidence from RCTs or observational (including surveillance) studies of vaccine effectiveness of 50% or higher ((OR/RR >2 or < .5), based on consistent evidence from two or more studies with no major21 confounders. Studies may be upgraded by 2 points if there is strong evidence from RCTs or observational studies of a vaccine effectiveness of 80% or higher (or depending on the outcome a an (OR/RR >5 or < .2) based on consistent evidence from two or more studies with no major confounders

Dose-Response Gradient: Studies may be upgraded if there is evidence of a dose response gradient at the population level i.e. Increase by 1 point if there is evidence of risk reduction in disease incidence with increasing population vaccine coverage. Evidence of decreased risk with increased vaccine coverage includes evidence of reversal at population level (disease returns when vaccine coverage decreases) and evidence of risk reduction in older or younger age-groups not targeted for the intervention but who benefit from herd immunity Increase by 2 points if there is very strong evidence of risk reduction with increasing population vaccine coverage Antagonistic bias and confounding Major Confounders:22 Studies may be upgraded by 1 point if all major confounders would have reduced the effect. Good quality study design: Increase by 1 point if there was good quality of study design to control for confounding and differential biases among cases and controls e.g. with population based record linkage, self controlled case series or other appropriate designs. The score may be further upgraded by 1 point if there is consistency between studies across different settings, different investigators and different designs23

21 Changed from "plausible" confounders in the formal GRADE framework. 22 This criterion has been slightly modified from the GRADE criteria, which specify that all "plausible" confounders would have reduced the effect. 23 This criterion is not included in the formal GRADE framework. It is only applicable to observational studies.

14

In the GRADE table, ratings are clearly indicated. For reductions in score, possible ratings include "none serious" (no downgrade), "serious" (downgrade by 1 point), or "very serious" (downgrade by 2 points). For upgrading the score, possible ratings include "not applicable" (no upgrade), "strong evidence" (upgrade by 1 point), or "very strong evidence" (upgrade by 2 points). Finals scores cannot exceed 4 points or drop below 1. Whenever a downgrade or upgrade is applied, a footnote is needed to explain to others the rationale for the change in score. For example, current studies evaluating HPV vaccine efficacy may be downgraded under the criterion of "indirectness" at this time due to the use of surrogate endpoints in measuring vaccine efficacy. A footnote would be required explaining this in the table. In some cases, studies may not be downgraded, but footnotes should still be used to highlight any potential issues. This promotes transparency and shows readers that the full range of issues has been considered. The decision to downgrade or upgrade a body of evidence is often subjective and depends on individual judgement. While two individuals may agree on the study limitations during a review of the evidence, whether or not such limitations warrant a change in score may not be clear. Similarly the amount of variation in results from multiple studies allowed before they are deemed inconsistent may be contentious. These examples illustrate the subjective nature of the exercise, the importance of expert opinion in interpretation and assessment of the criteria, and the need to explain one's thought process throughout the evaluation so that areas of agreement and disagreement are evident.

3.3 Quality of evidence rating Using the criteria described above, individual studies and the collective body of evidence should be evaluated. The collection of studies will receive a score based upon analysis of the component studies. One should score the quality of scientific evidence using the GRADE scale:

• We are very confident that the true effect lies close to that of the estimate of effect on health outcome (score of 4, or ⊕⊕⊕⊕)

• We are moderately confident in the estimate of effect on health outcome: The true effect is likely to be close to the estimate of the effect, but there is a possibility that it is substantially different (score of 3, or ⊕⊕⊕)

• Our confidence in the estimate of the effect on the health outcome is limited: The true effect maybe substantially different from the estimate of the effect (score of 2, or ⊕⊕)

• We have very little confidence in the estimate of the effect on the health outcome. The true effect is likely to be substantially different from the estimate of effect (score of 1, or ⊕)

The GRADE tables explicitly provide the score of the outcomes critical to the recommendation. These factors help inform whether or not a recommendation should be made.

3.4 Application of GRADE to Recommendations

15

Using the formal GRADE approach developed by the GRADE Working Group, scoring is also applied to the recommendations (i.e. strong versus weak or conditional recommendations). WHO and SAGE have made the decision not to GRADE vaccine recommendations as weak recommendations are of little value to country immunization programs. It is the goal of WHO and SAGE to provide only strong recommendations, which may be either for or against an activity, or may be condition-dependent.24 An informal review of WHO vaccine recommendations was undertaken in the summer of 2010. WHO and SAGE will now review all recommendations for their strength and refrain from making ambiguous or weak recommendations.

3.5 Presentation of GRADE Tables GRADE tables are available on the IVB website together with the published vaccine position papers. In the body of the text of position papers, GRADE tables are cited as footnotes. They may be updated when new evidence becomes available. If additional evidence provides further scientific support for the recommendations, the GRADE tables may be updated by WHO without updating the position paper. If new evidence arises that necessitates a re-evaluation of the vaccine position paper recommendations, a more formal updating process will be initiated. To construct GRADE tables, data extraction and quality review at the level of individual studies will be undertaken. Documentation of this process should be available upon request, but will not be published on the IVB website as part of the GRADE tables. A review of the totality of evidence will be done for each question, with multiple tables if necessary data come from a variety of design approaches. GRADE tables will be presented in the context of the question of interest, the setting, and the final recommendation that arises from the full review. See Appendix 5 for an example of a GRADE table.

4. Vaccine Recommendation Development - Beyond Scoring the Evidence

4.1 Other considerations when making recommendations Much work goes into information gathering and synthesis that forms the basis of vaccine recommendations and guidance. Even recommendations that do not utilize a formal GRADE evaluation are the product of data review, discussion, and deliberation. In addition to the scientific evidence base, other factors are important to the final recommendation. GRADE has outlined the following five parameters to be considered.

24 In the formal GRADE framework, a conditional recommendation is synonymous with a weak recommendation. For WHO and SAGE, a conditional recommendation is a strong recommendation constrained to a particular subpopulation or country after having met given criteria. For example, a second dose of measles vaccine in national schedules is not recommended until a country has achieved 80% coverage of the first dose of measles vaccine for the last 3 years.

16

Box 3. Considerations in recommendation development 1. Effectiveness and safety of the intervention, with evidence quality scored by e.g. GRADE or SCOPE 2. Disease epidemiology, clinical characteristics, and economic considerations, with evidence assessed by systematic literature review and critical appraisal 3. Balance between benefits and risks 4. Opportunities for capitalizing on prevailing societal values and preferences 5. Immunization program concerns (e.g. cold chain, logistics, vaccine availability, fit with the other vaccine schedules, ability to deliver, resources needed, impact on budget) 6. Social, cultural, ethical and legal issues

All of these factors are taken into consideration when recommendations are proposed. The question of costs and resource at the global level is particularly challenging, again highlighting the need for transparent review of the data and key issues so that countries may make their own decisions and prioritize health interventions. SAGE should consider the societal perspective when evaluating cost and resource implications. The decision to implement a program will always have trade-offs which must be carefully reviewed at the national level prior to adoption of recommendations. Societal values are critical factors that have a strong impact on the vaccine policy decisions, such as timing of vaccination, whether it is mandated, number of doses for optimal protection, and goals of a program. It is only after careful review of the evidence, risk-benefit ratio, values, and feasibility that recommendations are made.

4.2 Updating Recommendations As the evidence and/or other factors change, recommendations will be updated to reflect the best data available. Position papers are reviewed periodically by WHO staff to determine when a full update of a position paper is warranted. In some cases, a brief update may be sufficient. For example, in 2007 WHO recommended adoption of rotavirus vaccine only in countries where effectiveness data were available. After such trials were conducted in Africa and Asia, WHO published a brief update in 2009 in which WHO recommended inclusion or rotavirus vaccine in all national schedules.25 When a full update is needed, SAGE comprehensively reviews the evidence to update recommendations.

4.3 Emergency situations When outbreaks, natural disasters, or humanitarian emergencies occur, lack of time and context-specific data may necessitate a modified process for development of recommendations. Quick decisions may be needed that rely on indirect data interpreted using expert judgement. In such a situation, recommendations may be issued quickly and revised as t he context changes and/or additional data are available.

25 For example: http://www.who.int/entity/wer/2009/wer8451_52.pdf

17

5. Conclusions Evidence-based vaccine policy is critical for the development of global recommendations. Creating guidance for vaccine use with various products in different geographic and cultural contexts is a challenging endeavour that must have a foundation in the best scientific evidence available. The approach described above represents thinking from a range of immunization experts on how best to apply a rigorous approach to evaluating the quality of scientific evidence. Judgements will always be necessary in policy development, requiring transparency throughout the process. These guidelines are intended to increase transparency and standardization of the development of WHO vaccine recommendations development.

18

Appendix 1. Data Items to Consider for Extraction from Included Studies Data extraction forms should be tailored for each systematic review. The data items below represent key fields to consider including in the data extraction form when appropriate

1. Study Author, Year 2. Name of reviewer 3. Date of review

4. Methods

4.1. Study design 4.2. Source of sample(s) 4.3. Sampling method 4.4. Sample size 4.5. Entry criteria/exclusions 4.6. Non-respondents/Loss to follow up 4.7. Which parts of the study were prospective

5. Participants

5.1. Setting 5.2. Country 5.3. Age (range and mean/median) 5.4. Gender (% male/female) 5.5. Ethnicity 5.6. Control group 5.7. Definition of controls 5.8. Source of controls 5.9. Comparability

5.9.1. Potential confounders identified 5.9.2. Baseline assessment of outcome variables

6. Group Allocation

6.1. Randomization 6.1.1. Sequence generation 6.1.2. Allocation sequence concealment 6.1.3. Blinding

6.2. Allocation by 6.2.1. Quasi-randomization 6.2.2. Time differences 6.2.3. Location differences 6.2.4. Treatment decisions 6.2.5. Participants' preferences 6.2.6. On the basis of outcome 6.2.7. Other important processes

7. Intervention

7.1. Vaccine (formulation, dose, etc) 7.2. Length of follow up

8. Outcomes

8.1. How defined

19

8.2. Intervals at which outcomes were assessed 8.3. Validity 8.4. Reproducibility 8.5. Quality control 8.6. Missing/incomplete data 8.7. Selective reporting

9. Summary of Results

10. Summary of Possible Risks of Bias

10.1. Selection Bias 10.2. Information Bias 10.3. Confounding

20

Appendix 2. Checklists for Reviewing Study Quality Courtesy of the Critical Appraisal Skills Programme

Appendix 2a. Checklist for RCTs Appendix 2b. Checklist for Case-Control Studies Appendix 2c. Checklist for Cohort Studies Appendix 2d. Checklist for Systematic Reviews

© Public Health Resource Unit, England (2006). All rights reserved.

Critical Appraisal Skills Programme (CASP) making sense of evidence

10 questions to help you make sense of

randomised controlled trials

How to use this appraisal tool Three broad issues need to be considered when appraising the report of a randomised controlled trial: • Is the trial valid?

• What are the results?

• Will the results help locally?

The 10 questions on the following pages are designed to help you think about these issues systematically.

The first two questions are screening questions and can be answered quickly. If the answer to both is “yes”, it is worth proceeding with the remaining questions.

You are asked to record a “yes”, “no” or “can’t tell” to most of the questions. A number of italicised prompts are given after each question.

These are designed to remind you why the question is important. Record your reasons for your answers in the spaces provided. The 10 questions are adapted from Guyatt GH, Sackett DL, and Cook DJ, Users’ guides to the medical literature. II. How to use an article about therapy or prevention. JAMA 1993; 270 (21): 2598-2601 and JAMA 1994; 271(1): 59-63

© Public Health Resource Unit, England (2006). All rights reserved.

No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording or otherwise without the prior written permission of the Public Health Resource Unit. If permission is given, then copies must include this statement together with the words “© Public Health Resource Unit, England 2006”. However, NHS organisations may reproduce or use the publication for non-commercial educational purposes provided the source is acknowledged.

© Public Health Resource Unit, England (2006). All rights reserved.

Screening Questions

1. Did the study ask a clearly-focused question? Yes Can’t tell No

Consider if the question is ‘focused’ in terms of:

– the population studied

– the intervention given

– the outcomes considered

2. Was this a randomised controlled trial (RCT) Yes Can’t tell No

and was it appropriately so?

Consider:

– why this study was carried out as an RCT

– if this was the right research approach for the question being asked

Is it worth continuing?

Detailed Questions

3. Were participants appropriately allocated to Yes Can’t tell No intervention and control groups?

Consider:

– how participants were allocated to intervention and control groups. Was the process truly random?

– whether the method of allocation was described. Was a method used to balance the randomization, e.g. stratification?

– how the randomization schedule was generated and how a participant was allocated to a study group

– if the groups were well balanced. Are any differences between the groups at entry to the trial reported?

– if there were differences reported that might have explained any outcome(s) (confounding)

© Public Health Resource Unit, England (2006). All rights reserved.

4. Were participants, staff and study personnel Yes Can’t tell No

‘blind’ to participants’ study group? Consider:

– the fact that blinding is not always possible

– if every effort was made to achieve blinding

– if you think it matters in this study

– the fact that we are looking for ‘observer bias’

5. Were all of the participants who entered the Yes Can’t tell No

trial accounted for at its conclusion? Consider:

– if any intervention-group participants got a control-group option or vice versa

– if all participants were followed up in each study group (was there loss-to-follow-up?)

– if all the participants’ outcomes were analysed by the groups to which they were originally allocated (intention-to-treat analysis)

– what additional information would you liked to have seen to make you feel better about this

6. Were the participants in all groups followed Yes Can’t tell No up and data collected in the same way?

Consider:

– if, for example, they were reviewed at the same time intervals and if they received the same amount of attention from researchers and health workers. Any differences may introduce performance bias.

7. Did the study have enough participants to Yes Can’t tell No minimise the play of chance?

Consider:

– if there is a power calculation. This will estimate how many participants are needed to be reasonably sure of finding something important (if it really exists and for a given level of uncertainty about the final result).

© Public Health Resource Unit, England (2006). All rights reserved.

8. How are the results presented and what is

the main result? Consider:

– if, for example, the results are presented as a proportion of people experiencing an outcome, such as risks, or as a measurement, such as mean or median differences, or as survival curves and hazards

– how large this size of result is and how meaningful it is

– how you would sum up the bottom-line result of the trial in one sentence

9. How precise are these results?

Consider:

– if the result is precise enough to make a decision

– if a confidence interval were reported. Would your decision about whether or not to use this intervention be the same at the upper confidence limit as at the lower confidence limit?

– if a p-value is reported where confidence intervals are unavailable

10. Were all important outcomes considered so Yes Can’t tell No the results can be applied?

Consider whether: – the people included in the trail could be different from your population in ways that would produce different results – your local setting differs much from that of the trial – you can provide the same treatment in your setting

Consider outcomes from the point of view of the: – individual – policy maker and professionals – family/carers – wider community

Consider whether: – any benefit reported outweighs any harm and/or cost. If this information is not reported can it be filled in from elsewhere? – policy or practice should change as a result of the evidence contained in this trial

© Public Health Resource Unit, England (2006). All rights reserved.

Critical Appraisal Skills Programme (CASP) making sense of evidence

11 questions to help you make sense of a

case control study

How to use this appraisal tool Three broad issues need to be considered when appraising a case control study: • Are the results of the study valid?

• What are the results?

• Will the results help locally?

The 11 questions on the following pages are designed to help you think about these issues systematically.

The first two questions are screening questions and can be answered quickly. If the answer to both is “yes”, it is worth proceeding with the remaining questions.

There is a fair degree of overlap between several of the questions.

You are asked to record a “yes”, “no” or “can’t tell” to most of the questions.

A number of italicised prompts are given after each question. These are designed to remind you why the question is important. Record your reasons for your answers in the spaces provided. © Public Health Resource Unit, England (2006). All rights reserved.

No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording or otherwise without the prior written permission of the Public Health Resource Unit. If permission is given, then copies must include this statement together with the words “© Public Health Resource Unit, England 2006”. However, NHS organisations may reproduce or use the publication for non-commercial educational purposes provided the source is acknowledged.

A/ Are the results of the study valid?

Screening Questions

1. Did the study address a clearly focused Yes Can’t tell No

Issue? A question can be focused in terms of: – the population studied – the risk factors studied – whether the study tried to detect a

beneficial or harmful effect?

2. Did the authors use an appropriate Yes Can’t tell No Method to answer their question?

Consider: – is a case control study an appropriate

way of answering the question under the circumstances? (is the outcome rare or harmful?)

– did it address the study question?

Is it worth continuing?

Detailed Questions

3. Were the cases recruited in an acceptable Yes Can’t tell No way?

HINT: We are looking for selection bias which might compromise the validity of the findings:

– Are the cases defined precisely? – Were the cases representative of a defined

population (geographically and/or temporally)?

– Was there an established reliable system for selecting all the cases?

– Are they incident or prevalent? – Is there something special about the cases? – Is the time frame of the study relevant to

the disease/exposure? – Was there a sufficient number of cases selected?

© Public Health Resource Unit, England (2006). All rights reserved. 2

– Was there a power calculation?

4. Were the controls selected in an Yes Can’t tell No

acceptable way? HINT: We are looking for selection bias which might compromise the generalisability of the findings:

– Were the controls representative of a defined population (geographically and/or temporally)?

– Was there something special about the controls?

– Was the non-response high? Could non- respondents be different in any way?

– Are they matched, population based or randomly selected?

– Was there a sufficient number of controls selected?

5. Was the exposure accurately measured Yes Can’t tell No

to minimise bias? HINT: We are looking for measurement, recall or classification bias:

– Was the exposure clearly defined and accurately measured?

– Did the authors use subjective or objective measurements?

– Do the measures truly reflect what they are supposed to measure? (have they been validated?)

– Were the measurement methods similar in cases and controls?

– Did the study incorporate blinding where feasible?

© Public Health Resource Unit, England (2006). All rights reserved. 3

– Is the temporal relation correct? (does the exposure of interest precede the outcome?)

6. A. What confounding factors have the authors accounted for? List the other ones you think might be important, that the authors missed (genetic, environmental and socio-economic) B. Have the authors taken account of the Yes Can’t tell No

potential confounding factors in the design and/or in their analysis?

HINT: Look for restriction in design, and techniques, e.g. modeling, stratified-, regression-, or sensitivity analysis to correct, control or adjust for confounding factors.

B/ What are the results?

7. What are the results of this study?

Consider: – What are the bottom line results? – Is the analysis appropriate to the design? – How strong is the association between

exposure and outcome (look at the odds ratio)?

– Are the results adjusted for confounding and might confounding still explain the association?

– Has adjustment made a big difference to The OR ??

8. How precise are the results? How precise is the estimate of risk?

Consider: – Size of the P-value – Size of the confidence intervals – Have the authors considered all the

important variables? – How was the effect of subjects refusing

to participate evaluated?

© Public Health Resource Unit, England (2006). All rights reserved. 4

9. Do you believe the results? Yes No Consider: – Big effect is hard to ignore! – Can it be due to chance, bias or

confounding? – Are the design and methods of this

study sufficiently flawed to make the results unreliable?

– Consider Bradford Hills criteria (e.g. time sequence, dose-response gradient, strength, biological plausibility)

Is it worth continuing?

C/ Will the results help me locally?

10. Can the results be applied to the local Yes Can’t tell No population?

Consider whether:

– The subjects covered in the study could be sufficiently different from your population to cause concern.

– Your local setting is likely to differ much from that of the study.

– Can you estimate the local benefits and harms?

11. Do the results of this study fit with other Yes Can’t tell No available evidence?

HINT: Consider all the available evidence from RCTs, systematic reviews, cohort studies and case-control studies as well for consistency.

One observational study rarely provides sufficiently robust evidence to recommend changes to clinical practice or within health policy decision making. However, for certain questions observational studies provide the only evidence. Recommendations from observational studies are always stronger when supported by other evidence.

© Public Health Resource Unit, England (2006). All rights reserved. 5

Critical Appraisal Skills Programme Page 1 09/01/04

CRITICAL APPRAISAL SKILLS PROGRAMME

making sense of evidence

12 questions to help you make sense of a cohort study

General comments

• Three broad issues need to be considered when appraising a cohort study.

Are the results of the study valid?

What are the results?

Will the results help locally?

The 12 questions on the following pages are designed to help you think about these issues systematically.

• The first two questions are screening questions and can be answered quickly. If the answer to those two is "yes", it is worth proceeding with the remaining questions.

• There is a fair degree of overlap between several of the questions.

• You are asked to record a "yes", "no" or "can't tell" to most of the questions.

• A number of italicised hints are given after each question. These are designed to remind you why the question is important. There will not be time in the small groups to answer them all in detail!

� Critical Appraisal Skills Programme (CASP) 2004. All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording or otherwise without the prior permission of CASP. However, organisations may reproduce or use the publication for non-commercial educational purposes provided the source is acknowledged. Enquiries concerning reproduction or use in other circumstances should be addressed to CASP.

Critical Appraisal Skills Programme Page 2 09/01/04

A/ Are the results of the study valid? Screening Questions

1 Did the study address a clearly focused issue?

HINT: A question can be focused in terms of: - the population studied - the risk factors studied - the outcomes considered - is it clear whether the study tried to detect a beneficial or harmful effect?

Yes Can't tell No � � �

2 Did the authors use an appropriate method to answer their question?

Yes Can't tell No � � �

HINT: Consider - Is a cohort study a good way of answering the question under the circumstances? -Did it address the study question?

Is it worth continuing?

Detailed Questions

3 Was the cohort recruited in an acceptable way?

HINT: We are looking for selection bias which might compromise the generalisability of the findings: - Was the cohort representative of a defined population? - Was there something special about the cohort? - Was everybody included who should have been included?

Yes Can't tell No � � �

Critical Appraisal Skills Programme Page 3 09/01/04

4. Was the exposure accurately measured to minimize bias?

HINT: We are looking for measurement or classification bias: - Did they use subjective or objective measurements? - Do the measures truly reflect what you want them to (have they been validated)? - Were all the subjects classified into exposure groups using the same procedure?

Yes Can't tell No � � �

5. Was the outcome accurately measured to minimize bias?

Yes Can't tell No � � �

HINT: We are looking for measurement or classification bias:

- Did they use subjective or objective measurements? - Do the measures truly reflect what you want them to (have they been validated)?

- Has a reliable system been established for detecting all the cases (for measuring disease occurrence)?

- Were the measurement methods similar in the different groups? - Were the subjects and/or the outcome

assessor blinded to exposure (does this matter)?

6. A. Have the authors identified all important confounding factors?

List the ones you think might be important, that the authors missed.

B. Have they taken account of the confounding factors in the design and/or analysis?

HINT: - Look for restriction in design, and techniques eg modelling, stratified-, regression-, or sensitivity analysis to correct, control or adjust for confounding factors

Yes Can't tell No � � � Yes Can't tell No� � �

List:

Critical Appraisal Skills Programme Page 4 09/01/04

7. A. Was the follow up of subjects complete enough?

B. Was the follow up of subjects long enough? HINT: - The good or bad effects should have had long enough to reveal themselves -The persons that are lost to follow-up may have different outcomes than those available for assessment - In an open or dynamic cohort, was there anything special about the outcome of the people leaving, or the exposure of the people entering the cohort?

Yes Can't tell No� � �

Yes Can't tell No� � �

B/ What are the results? 8. What are the results of this study?

HINT:

- What are the bottom line results? - Have they reported the rate or the proportion

between the exposed/unexposed, the ratio/the rate difference?

- How strong is the association between exposure and outcome (RR,)?

- What is the absolute risk reduction (ARR)?

9. How precise are the results?

How precise is the estimate of the risk? HINT:

- Size of the confidence intervals

10. Do you believe the results? HINT: - Big effect is hard to ignore! - Can it be due to bias, chance or confounding? - Are the design and methods of this study

sufficiently flawed to make the results unreliable? - Consider Bradford Hills criteria (eg time sequence, dose-response gradient, biological plausibility, consistency).

Yes Can't tell No � � �

Critical Appraisal Skills Programme Page 5 09/01/04

Is it worth continuing?

C/ Will the results help me locally? 11. Can the results be applied to the

local population? HINT: Consider whether

- The subjects covered in the study could be sufficiently different from your population to cause concern. - Your local setting is likely to differ much from that of the study - Can you quantify the local benefits and harms?

Yes Can’t tell No � � �

12. Do the results of this study fit with other available evidence?

Yes Can’t tell No � � �

One observational study rarely provides sufficiently robust evidence to recommend changes to clinical practice or within health policy decision making. However, for certain questions observational studies provide the only evidence. Recommendations from observational studies are always stronger when supported by other evidence.

Critical Appraisal Skills Programme (CASP) making sense of evidence

10 questions to help you make sense of

reviews

How to use this appraisal tool Three broad issues need to be considered when appraising the report of a systematic review: • Is the study valid?

• What are the results?

• Will the results help locally?

The 10 questions on the following pages are designed to help you think about these issues systematically.

The first two questions are screening questions and can be answered quickly. If the answer to both is “yes”, it is worth proceeding with the remaining questions.

You are asked to record a “yes”, “no” or “can’t tell” to most of the questions. A number of italicised prompts are given after each question.

These are designed to remind you why the question is important. Record your reasons for your answers in the spaces provided. The 10 questions are adapted from Oxman AD, Cook DJ, Guyatt GH, Users’ guides to the medical literature. VI. How to use an overview. JAMA 1994; 272 (17): 1367-1371

© Public Health Resource Unit, England (2006). All rights reserved.

No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording or otherwise without the prior written permission of the Public Health Resource Unit. If permission is given, then copies must include this statement together with the words “© Public Health Resource Unit, England 2006”. However, NHS organisations may reproduce or use the publication for non-commercial educational purposes provided the source is acknowledged.

© Public Health Resource Unit, England (2006). All rights reserved.

Screening Questions

1. Did the review ask a clearly-focused question? Yes Can’t tell No

Consider if the question is ‘focused’ in terms of:

– the population studied

– the intervention given or exposure

– the outcomes considered

2. Did the review include the right type of study? Yes Can’t tell No

Consider if the included studies:

– address the review’s question

– have an appropriate study design

Is it worth continuing?

Detailed Questions

3. Did the reviewers try to identify all Yes Can’t tell No relevant studies?

Consider:

– which bibliographic databases were used

– if there was follow-up from reference lists

– if there was personal contact with experts

– if the reviewers searched for unpublished studies

– if the reviewers searched for non-English-language studies

4. Did the reviewers assess the quality of the Yes Can’t tell No included studies?

Consider:

– if a clear, pre-determined strategy was used to determine which studies were included. Look for:

– a scoring system

– more than one assessor

© Public Health Resource Unit, England (2006). All rights reserved.

5. If the results of the studies have been Yes Can’t tell No combined, was it reasonable to do so?

Consider whether:

– the results of each study are clearly displayed

– the results were similar from study to study (look for tests of heterogeneity)

– the reasons for any variations in results are discussed

6. How are the results presented and what is the main result?

Consider:

– how the results are expressed (e.g. odds ratio, relative risk, etc.)

– how large this size of result is and how meaningful it is

– how you would sum up the bottom-line result of the review in one sentence

7. How precise are these results? Consider:

– if a confidence interval were reported. Would your decision about whether or not to use this intervention be the same at the upper confidence limit as at the lower confidence limit?

– if a p-value is reported where confidence intervals are unavailable

© Public Health Resource Unit, England (2006). All rights reserved.

8. Can the results be applied to the local Yes Can’t tell No population?

Consider whether:

– the population sample covered by the review could be different from your population in ways that would produce different results

– your local setting differs much from that of the review

– you can provide the same intervention in your setting

9. Were all important outcomes considered? Yes Can’t tell No

Consider outcomes from the point of view of the:

– individual

– policy makers and professionals

– family/carers

– wider community

10. Should policy or practice change as a result of Yes Can’t tell No the evidence contained in this review?

Consider:

– whether any benefit reported outweighs any harm and/or cost. If this information is not reported can it be filled in from elsewhere?

© Public Health Resource Unit, England (2006). All rights reserved.

39

Appendix 3. Draft Summary Table for Evidence Review

Study Authors

Year Location Study Population Vaccination/ Methods Limitations/Potential Sources of Bias

Relevant Outcomes Comments Intervention

40

Appendix 4. Rating the Quality of the Evidence Quality of evidence Quality starting

factor is first assigned base on Study Design

Quality score is lowered1 if

Quality score is raised1 if

We are very confident that the trueeffect lies close to that of the estimate of effect on health outcome (4)

Randomised trials

We are moderately confident in the estimate of effect on health outcome. The true effect is likely to be close to the estimate of the effect, but there is a possibility that it is substantially different (3)

Our confidence in the estimate of the effect on the health outcome is limited. The true effect may be substantially different from the estimate of the effect (2)

Observational studies, disease surveillance and post market safety surveillance data

We have very little confidence in the estimate of the effect on the health outcome. The true effect is likely to be substantially different from the estimate of effect (1)

1)Limitation of design:2 -1 Serious -2 Very serious 2)Inconsistency: -1 Serious -2 Very serious 3)Indirectness:2

-1 Serious -2 Very serious 4)Imprecision: -1 Serious -2 Very serious 5)Publication Bias:-1 Likely -2 Very likely

1)Strength of association: +1 RR or OR>2 (or <0.5) in 2+ studies +2 RR or OR >5 (or <0.2) in 2+ studies 2)Dose response (population based): +1 Evidence of decreased risk with increased vaccine coverage including evidence of reversal at population level (disease returns when vaccine coverage is decreased) population based dose response +2 Very strong evidence of decreased risk with increased coverage 3)Antagonistic bias and confounding: +1 All major confounders would have reduced the effect or +1 Ability of design to control for confounding and avoid biases +2 If in addition to design, consistency across different settings, different investigators, and possibly different designs

1 1=move up or down one grade (for example from high (4) to intermediate (3)), 2= move up or down two grades (for example from low(2) to high(4)) 2 Should be commensurate with study design

41

Appendix 5. Template of a GRADE table used to score the quality of evidence. Different study designs may be graded separately in different tables (e.g. RCT and observational studies) or only the highest quality design used while including consideration of other sources of evidence through footnotes and adjusting the score as appropriate.

Question necessary for recommendation development: s

Rating Adjustment to score

No of Studies/Starting Score

Limitation in study design

Inconsistency

Indirectioness

Imprecision

Factors decreasing confidence

Publication Bias

Strength of associationt

Dose-Response

Factors increasing confidence Antagonistic

bias and confounding

Qua

lity

Ass

essm

ent

Final Score

Quality

Sum

mar

y of

Fi

ndin

gs

Conclusion

Study Design Final Score for Design

RCTs

Observational Studies FINAL SCORE (highest score)