1
Evidence Based Policy, Evidence Grading Schemes and Entities, and Ethics in Complex Research
Systems
Robert Boruch, University of PennsylvaniaSeptember 14-15 2008
5th European Conference on Complex SystemsJerusalem
2
Summary of ThemesSummary of Themes
People want to know “what works” so as to inform their decisions.
The scientific quality of evidence on “what works” is variable and often poor.
People’s access to information on what works through the internet is substantial.
Organizations and data bases have been created to (a) develop evidence grading schemes, (b) apply the schemes in systematic reviews of evidence from multiple studies, and (c) disseminate results through the internet.
3
The “What Works” ThemeThe “What Works” Theme
“What works” refers here to estimating the effects of social, educational, or criminological interventions
In a statistically/scientifically unbiased wayAnd so as to generate a statistical statement
of one’s confidence in the results.
4
Evidence Based Policy/Law: A Evidence Based Policy/Law: A Driver of Interest in What Works Driver of Interest in What Works US, Canada, UK, Israel (e.g. National
Academy)Sweden, Norway, DenmarkAustralia, Malaysia, ChinaMexico, others in Central AmericaMultinationals: OECD, World BankOthers
5
Information Glut as Driver: Naïve Information Glut as Driver: Naïve Web SearchesWeb Searches
A Google search on “evidence based, ” yields 9, 660,000 links.
A Google search on “what works” yields 6,350,000 links (.21 seconds).
A Google search on “evidence based practice” yields 2,000,000 links (.42 seconds).A Google search on “evidence based policy” yields 132,000 links (.35 seconds).
What are we to make of this?
6
Publication Rates in Publication Rates in EducationEducation
20,000 articles on education published each year in English language journals
2/1,000-5/1,000/year report on controlled trials of programs, policies, or practices to estimate effectiveness
For every curriculum package that has been tested in a controlled trial, there are 50-80 that are claimed to be effective based on no defensible scientific evidence.
7
Relevant Organizations Nested Relevant Organizations Nested National, State/Provincial, Municipal:
Policy or lawAgencies with nation, etc., e.g. National
Science Foundations, Institute for Education Sciences (US), University Research
Programs and projects within agenciesData bases and reports within projectsUsers of information at each level, e.g.
scientists, policy people, the public
8
International Organizations: International Organizations: NGOsNGOs
Cochrane Collaboration in Health Care: http://cochrane.org
Campbell Collaboration in education, welfare, crime and justice: http://campbellcollboration.org
9
Two Examples HereTwo Examples Here
International Campbell Collaboration in education, welfare, crime and justice
What Works Clearinghouse in education (Institute for Education Sciences, US)
10
Data Bases in this ContextData Bases in this Context
Evidence Grading Schemes currently focus on reports of statistical analyses of impact, not micro-records of individuals as yet.
Example: 5-10 statistical reports (ingredients of part of data base) on evaluating impact of conditional income transfer programs in developing regions
Example: Cochrane Collaboration data base on randomized trials contains nearly .5 million such reports
“Meta-analysis” of results of multiple studies
11
C2 SPECTR C2 SPECTR
C2 Social, Psychological, Educational, and Criminological Trials Register
13,000+ entries on randomized and possibly randomized trials
Feeding into C2 systematic reviews Feeding into the IES What Works
Clearinghouse (USDE)
12
The Campbell Collaboration The Campbell Collaboration
Mission: since 2000, prepare, maintain and make accessible C2 systematic reviews of evidence on the the effects of interventions (“what works” ) to inform
decision makers and stakeholders. International and Multidisciplinary: Education,
social welfare/services, crime and justice http://campbellcollaboration .org Precedent: Cochrane Collaboration in health (1993)
13
Nine Key Principles of C2: Nine Key Principles of C2: A Scientific EthicA Scientific Ethic
1. Collaborating across Nations and Disciplines
2. Building on Enthusiasm
3. Avoiding Duplication
4. Minimizing Bias
5. Keeping Current
6. Striving for Relevance
7. Promoting Access
8. Ensuring Quality
9. Maintaining Continuity
14
What are Evidence Grading What are Evidence Grading Schemes (EGSs) ? Schemes (EGSs) ?
These are inventories (guidance, checklists, scales) or processes that…
facilitate making transparent and uniform scientific judgments about…
The quality of evidence on effects of programs or practices or policies
15
C2’s and Others’ Major Evidence C2’s and Others’ Major Evidence Grading Distinction on What Grading Distinction on What
WorksWorks Randomized controlled trials yield the least biased
and least equivocal evidence on “what works” i.e. effect of a new intervention (program, practice, etc.)
Alternative methods to estimate the effect of interventions yield more equivocal and more biased estimates of effect, e.g. “before-after” evaluations and other nonrandomized trials.
Both randomized trials and nonrandomized trials are important, but they must be separated in evidence grading schemes.
16
Example: Randomized Controlled Example: Randomized Controlled TrialTrial
Individuals or entities such as villages or organizations are randomly allocated to one of two or more interventions
The random allocation assures a fair comparison of the effects of the interventions
And the random allocation assures a statistically credible statement about confidence in the result, e.g. confidence interval and statistical tests
17
More Specific ExampleMore Specific Example
A new commercially curriculum package for math education is the intervention under investigation
The new curriculum is RANDOMLY allocated to half of a sample of 100 schools, with the remaining half of schools serving as a control group, so as to form two equivalent groups of schools (fair comparison)
The outcomes, such as achievement test scores, from the intervention group and the control group are compared
18
Entities and Evidence Grading Entities and Evidence Grading Schemes for What Works Schemes for What Works
Cochrane Collaboration: Systematic reviews in health Campbell Collaboration: crime, education, welfare Society for Prevention Research (Prevention Science
2006) What Works Clearinghouse, Institute for Education
Sciences WWC IES http://whatworks.ed.gov Food and Drug Administration, other regulatory agencies National Register of Evidence-based Programs and
Practices Others: California etc.
19
What are the Ingredients of EGSs? What are the Ingredients of EGSs?
Pre-specification of primary outcomes Comparison condition fidelity
Pre-specification of all analyses Reliability of outcome measures
Pre-specification of all measures Validity of outcome measures
Control for assignment/selection bias Adherence to standards for data collection
Appropriate comparison condition Adjustment for differential attrition
Control for subject awareness of assigned intervention
Adjustment for overall loss to follow-up
Control for provider awareness of assigned intervention
Adjustment for missing data
Control for data collector awareness of assigned intervention
Analysis meets statistical assumptions
Assurances to participants to elicit disclosure Analysis consistent with study theory
Intervention fidelity/Measurement of exposure
Adjustment for multiple measures
Control for contamination and co-intervention Absence of or explanation for anomalous findings
Reliability and validity of exposure measures
20
WWC AimsWWC Aims
To be a trusted source of scientific evidence on what works, what does not, and on where evidence is absent…
Not to endorse products http://www.whatworks.ed.gov
Acrobat Document Acrobat Document
21
What Works Clearinghouse What Works Clearinghouse Illustration Illustration
22
Beginning Reading Review Protocol The Beginning Reading What Works Clearinghouse (WWC) review focuses on reading interventions for students in grades K–3 (or ages 5-8) that are intended to increase skills in alphabetics (phonemic awareness, phonological awareness, letter recognition, print awareness and phonics), reading fluency, comprehension (vocabulary and reading comprehension), or general reading achievement. Interventions for this review are defined as programs, products, practices, or policies that are intended to increase skills in the areas named above. For the first set of intervention Beginning Reading reports, the WWC focused on “branded” programs and products.
Effectiveness ratings for Beginning Reading programs in four domains
Intervention Alphabetics Comprehension Fluency General reading achievement
DaisyQuest
Reading Recovery® WWC Intervention Reports provide all findings that "Meet Evidence Standards" or "Meet Evidence Standards with Reservations" for studies on a particular intervention. Intervention reports are created for those interventions that have at least one study that "Meets Evidence Standards" or "Meets Evidence Standards with Reservations." Intervention reports are one component of the decision-making process, but should not be the sole source of information when making educational decisions. Key
Positive effects: strong evidence of a positive effect with no overriding contrary evidence
Potentially positive effects: evidence of a positive effect with no overriding contrary evidence
Mixed effects: evidence of inconsistent effects
No discernible effects: no affirmative evidence of effects
Potentially negative effects: evidence of a negative effect with no overriding contrary evidence
Negative effects: strong evidence of a negative effect with no overriding contrary evidence
23
Example: C2 Parental Example: C2 Parental Involvement TrialsInvolvement Trials
500 possibly relevant studies of impact45 Possible Randomized Controlled Trials
(RCTs)20 RCTs Met Study Inclusion Criteria20 RCTs Met Study Inclusion Criteria
18 RCTs included in the Meta-Analysis18 RCTs included in the Meta-AnalysisNye, Turner, Schwartz Nye, Turner, Schwartz
http//:campbellcollaboration.orghttp//:campbellcollaboration.org
24
Model Study name Comparison Outcome Statistics for each study Hedges's g and 95% CI
Hedges's Lower Upper g limit limit
Ryan (1964) Parent_vs_Control Combined 0.347 0.088 0.605
Aronson (1966) Combined Read_Ach 1.109 0.421 1.798
Clegg (1971) Combined Combined 0.776 -0.098 1.651
Hirst (1974) Parent_vs_Control Combined 0.181 -0.217 0.579
Henry (1974) Combined Combined 0.281 -0.677 1.239
O'Neil (1975) Combined Combined 0.223 -0.724 1.169
Tizard (1982) Combined Read_Comp 0.879 0.369 1.390
Heller (1993) ParentRpt_vs_ControlCombined 1.496 0.881 2.110
Miller (1993) Combined Combined 0.164 -0.557 0.884
Roeder (1993) Parent_vs_Control Math_Ach 0.123 -0.445 0.692
Fantuzzo (1995) Combined Combined 0.741 -0.047 1.529
Ellis (1996) Parent_vs_Control Combined -0.116 -0.652 0.420
Joy (1996) Combined Cr_Math_Ach 0.114 -0.842 1.071
Peeples (1996) Parent_vs_Control Combined 0.920 0.345 1.495
Kosten (1997) Parent_vs_Control Science_Ach 0.075 -0.573 0.723
Hewison (1988) Combined Read_Comp 0.646 0.089 1.203
Meteyer (1998) Parent_vs_Control Combined 0.381 -0.164 0.925
Powell-Smith (2000)Combined Combined -0.298 -1.076 0.480
Fixed 0.430 0.299 0.561
Random 0.453 0.248 0.659
-2.00 -1.00 0.00 1.00 2.00
Favors ControlFavors Treatment
Figure Efficacy of Parent Involvement on Student Achievement
Heterogeneity Statistics for a Fixed Effects Model: Q=35.6, df=17, Prob.=0.005, and I Squared=52.3%.
25
Example: Petrosino et al on Example: Petrosino et al on Scared Straight TrialsScared Straight Trials
Over 600 articles that are possibly relevant to impact of Scared Straight
Only 15 reach a “reasonable” level of scientific standard
Only 7 reached standard of being randomized controlled trial.
26
Figure 1. The effects of Scared Straight and other juvenile awareness programs on juvenile delinquency: random effects model, “first effect,” reported in the study (Petrosino, Turpin-Petrosino, and Buehler, 2002)
n=number of failures
N=number of participants
CI=confidence intervals
Random=random effects model assumed
27
C2 Product: Scared StraightC2 Product: Scared StraightPro Humanitate AwardPro Humanitate Award
Observational Studies Ashcroft: -50% crime Buckner: 0% Berry: -5% Mitchell -53% Several dozen others
Randomized Trials Mich: +26% crime Gtr Egypt: +5% Yarb: +1% Orchow: +2% Vreeland: +11% Finckenauer: +30% Lewis: +14%
28
Scientific EthicScientific Ethic
Providing access to scientific reports of evaluations of the effect of interventions, e.g. journal publications and limited circulation reports from governments or private organizations
Providing information beyond reports to assure understanding
In principle, but not always in practice, providing access to micro-records from impact evaluations
29
Ethics of Research on HumansEthics of Research on Humans
Evidence Grading Schemes and organizations need not worry about individual privacy because they have not access, as yet, to individuals records in identifiable form
They rely only on statistical/scientific reports that are published in peer reviewed journals and other reports and which include no individual records.
30
Ethics and Law: US Ethics and Law: US
Individual rights to privacy are routinely assured on account of professional ethics statements and laws in the US.
The relevant codes of professional ethics in US include those of AERA, ASA, AAPOR, APA, and others.
The relevant laws in the US include Family Education Rights and Privacy Act (FERPA), Privacy Act, HIPPA
31
Ethics and Randomized Ethics and Randomized Controlled TrialsControlled Trials
Relevant codes and law concern individual privacy and confidentiality of individual’s identifiable micro-records
Relevant regulations and codes include attention to informed consent (45CFR46)
Access to anonymous micro-records for secondary analysis is problematic and possibly unnecessary in this context
32
AppendicesAppendices
33
Robert Boruch: BioRobert Boruch: Bio
Boruch is the University Trustee Chair Professor in the Graduate School of Education and the Statistics Department of the Wharton School at the University of Pennsylvania, Philadelphia Pennsylvania
Boruch is Fellow of the American Statistical Association, Academy of Experimental Criminology, American Academy of Arts and Sciences, American Educational Research Association
Email: [email protected]
34
Provision to Advance Rigorous EvaluationsProvision to Advance Rigorous Evaluationsin Legislationin Legislation
The program shall allocate X% of program funds [or $Y million] to evaluate the effectiveness of funded projects using a methodology that –
– Includes, to the maximum extent feasible, random assignment of program participants (or entities working with such persons) to intervention and control groups; and
– Generates evidence on which program approaches and strategies are most effective.
The program shall require program grantees, as a condition of grant award, participate in such evaluations if asked, including the random assignment.
35
Provision to Advance Replication ofProvision to Advance Replication ofResearch-Proven InterventionsResearch-Proven Interventions
Agency shall establish a competitive grant program focused on scaling up research-proven models
Grant applicants shall –– Identify the research-proven model they will implement,
including supporting evidence (well-designed RCTs showing sizeable, sustained effects on important outcomes);
– Provide a plan to adhere closely to key elements of the the model; and
– Obtain sizeable matching funds from other sources, especially large formula grant programs.
36
A Focus on Data Bases that A Focus on Data Bases that Concern “What Works”Concern “What Works”
Here, the focus is on projects that generate evidence about “what works,” and what does not work using good scientific standards
This is different from a focus on projects or programs that generate information on nature of a problem, monitoring program compliance with law, etc.
37
What are the Campbell What are the Campbell Collaboration (C2) Assumptions?Collaboration (C2) Assumptions?
Public interest in evidence based policy and practice will increase.
Scientific and government interest in cumulation and synthesis of evidence on “what works” will increase.
Access to information and evidence of dubious quality and need to screen for quality of evidence will increase.
The use of randomized controlled trials to generate trustworthy evidence on what works will increase.
38
What are the Products?What are the Products?
1. Registries of C2 Systematic Reviews of the effects of interventions (C2-RIPE)
2. Registries of reports of randomized trials and non- randomized trials, (C2-SPECTR) and future reports of randomized trials (C2-PROT)
3. Standards of evidence for conducting C2 Systematic reviews
4. Annual Campbell Colloquia
5. Training for producing reviews
6. New technologies and methodologies
7. Web site: http://www.campbellcollaboration.org
39
What are Other C2 Products?What are Other C2 Products?
C2 Trials Register (C2 SPECTR): 13,000 entries Annals of the American Academy of Political and
Social Sciences: Special Issues C2 Prospective Trials Register C2 Policy Briefs Annual and Intermediate Meetings: London,
Philadelphia, Stockholm, Lisbon, Paris, Oslo, Copenhagen, Helsinki, Los Angeles
40
Hand Search vs Machine Based Hand Search vs Machine Based SearchSearch
Journal of Educational Psychology (‘03-”06)
Hand search: RCT=66Full Text Elec N=99: 59% accurate, 41%
false positives, 24% false negativesAbstract only Elect N=11: 91% accurate.
9% false positive, 85% false negative
41
What Is the Value Added ?What Is the Value Added ?
Building a cumulative knowledge baseDeveloping exhaustive searchesProducing transparent and uniform
standards of evidenceInternational scopePeriodic updatingMaking reviews accessible
42
C2 Futures/TensionsC2 Futures/Tensions
C2 Production: AIR and othersC2 Publications v journalsC2 and governments and C2 apart from
governmentsC2 and Sustainability, C2 as voluntary
Organization versus C2 and Spin Off Organizations and Products
43
What are Other Illustrative What are Other Illustrative Reviews?Reviews?
“Scared Straight” Programs (Done, Award) Multi-systemic Therapy (Done) Parental Involvement (Done) After School programs (Due 12/05) Peer Assisted Learning Counter Terrorism Strategies (Under revision) Reducing Illegal Firearms Possession