c2 training: oslo 2009 - campbell collaboration · c2 training materials –oslo –may 2009 effect...
TRANSCRIPT
The Campbell Collaboration www.campbellcollaboration.org
C2 Training: Oslo 2009
Effect Size Calculation II: Advanced Techniques
C2 Training Materials – Oslo – May 2009 www.campbellcollaboration.org
A brief introduction to effect sizes
Meta-analysis expresses the results of each study using a quantitative
index of effect size (ES).
ESs are measures of the strength or magnitude of a relationship of
interest.
ESs have the advantage of being comparable (i.e., they estimate the
same thing) across all of the studies and therefore can be
summarized across studies in the meta-analysis.
Also, they are relatively independent of sample size.
C2 Training Materials – Oslo – May 2009 www.campbellcollaboration.org
Effect Size Basics
• Effect sizes can be expressed in many different metrics
– d, r, odds ratio, risk ratio, etc.• So be sure to be specific about the metric!
• Effect sizes can be unstandardized or standardized
– Unstandardized = expressed in measurement units
– Standardized = expressed in standardized measurement units
C2 Training Materials – Oslo – May 2009 www.campbellcollaboration.org
Types of effect size
Most reviews use effect sizes from one of three families of effect sizes:
• the d family, including the standardized mean difference,
• the r family, including the correlation coefficient, and
• the odds ratio (OR) family, including proportions and other measures
for categorical data.
C2 Training Materials – Oslo – May 2009 www.campbellcollaboration.org
Effect size computation
• Compute a measure of the “effect” of each study as our outcome
• Range of effect sizes:
– Differences between two groups on a continuous measure
– Relationship between two continuous measures
– Differences between two groups on frequency or incidence
C2 Training Materials – Oslo – May 2009 www.campbellcollaboration.org
Types of effect sizes
• Standardized mean difference
• Correlation Coefficient
• Odds Ratios
C2 Training Materials – Oslo – May 2009 www.campbellcollaboration.org
Correlational data
11 12
1 2
1
n n
X X
n X X
M M M
C2 Training Materials – Oslo – May 2009 www.campbellcollaboration.org
Correlation Coefficient (r)
• Also a standardized effect size
• Relatively understandable to a wide
range of people
• If r = 0 then there is no relationship
• r = bivariate correlation
– two continuous variables
• rpb = point-biserial correlation
– one continuous and one dichotomous
variable
• Φ = “fee”
– two dichotomous variables
x yz zr
n
Σ=
C2 Training Materials – Oslo – May 2009 www.campbellcollaboration.org
Correlation data
10.5 log
1r
r
rZ e
r
ES r
ESES
ES
=
+=
−
C2 Training Materials – Oslo – May 2009 www.campbellcollaboration.org
Standard error of z-transform
1
3rZSEn
=−
C2 Training Materials – Oslo – May 2009 www.campbellcollaboration.org
Example
0.39
1 0.390.5 log
1 0.39
0.41
r
r
Z e
ES
ES
=
+=
−
=
C2 Training Materials – Oslo – May 2009 www.campbellcollaboration.org
Standard error of z-transform
1
100 3
0.10
rZSE =
−
=
C2 Training Materials – Oslo – May 2009 www.campbellcollaboration.org
95% confidence interval for z
[ ]0.41 1.96 * 0.10 0.21, 0.61± =
C2 Training Materials – Oslo – May 2009 www.campbellcollaboration.org
To translate back to r-metric
2
2
1
1
zr
zr
ES
ES
er
e
−=
+
C2 Training Materials – Oslo – May 2009 www.campbellcollaboration.org
Confidence interval in r-metric
[0.21, 0.54]
C2 Training Materials – Oslo – May 2009 www.campbellcollaboration.org
Computing correlations
• Usually straightforward if correlation matrix given
• Problem arises when regression is used in primary study
• Becker & Wang paper:
http://www.msu.edu/~mkennedy/TQQT/
C2 Training Materials – Oslo – May 2009 www.campbellcollaboration.org
C2 Training Materials – Oslo – May 2009 www.campbellcollaboration.org
Example with correlations
• Example is from:
Reynolds, A. J., Ou, S.-R., & Topitzes, J. W. (2004). Paths of
effects of early childhood intervention on educational
attainment and delinquency. Child Development, 75(5),
1299-1328.
• Study looked at the effects of preschool participation for
1,404 low-income children in the Chicago Longitudinal Study
on several later outcomes such as high school completion
and delinquency
C2 Training Materials – Oslo – May 2009 www.campbellcollaboration.org
Example with correlations
We will compute the Fisher z-transformation for the correlation between
retention and high school completion.
C2 Training Materials – Oslo – May 2009 www.campbellcollaboration.org
Example: ITBS Word analysis & H.S. Completion
• r = 0.173, N = 1,286
0.173
1 0.1730.5log 0.174
1 0.173
1 10.028
35.181,286 3
0.174 1.96(0.028) (0.147,0.201)
r
r
r
Z e
Z
ES
ES
SE
=
+= =
−
= = =−
± =
C2 Training Materials – Oslo – May 2009 www.campbellcollaboration.org
Odds ratios as effect sizes
• Odds in the treatment group ÷odds in the control group
• Odds ratios are relatively hard
for people to understand
• If OR = 1, odds were equal in
both groups
• OR = 2.0 is as strong as OR =
.50 (but effects are in opposite
directions)
(d)(c)Control
(b)(a)Treatment
Re-ArrestNo re-arrest
a
adbORc bc
d
= =
C2 Training Materials – Oslo – May 2009 www.campbellcollaboration.org
Outcomes of one study
18126Comparison
26
14
Failure
37
19
TOTAL
11TOTAL
5Treatment
SuccessDrummond et
al. (1990)
C2 Training Materials – Oslo – May 2009 www.campbellcollaboration.org
Odds of improving, ΩTrt
T
Prob(Success|Treatment)
Prob(Failure|Treatment)
Prob(S|Trt)
1- Prob(S|Trt)
Ω =
=
C2 Training Materials – Oslo – May 2009 www.campbellcollaboration.org
Odds of improving, ΩTrt
Estimate ΩTrt by OE
E
# successes / total # trtO
# failures / total # trt
5 /19 5
14 /19 14
=
= =
C2 Training Materials – Oslo – May 2009 www.campbellcollaboration.org
Odds of improving, ΩCntl
Estimate ΩCntl by OE’
E '
# successes / total # cntlO
# failures / total # cntl
6 /18 6
12 /18 12
=
= =
C2 Training Materials – Oslo – May 2009 www.campbellcollaboration.org
Odds ratio, ωTrt
Cntl
E
E '
estimated by
O # trt success /# trt failureso
O # cntl success /# cntl failures
# trt s # cntl s # trt s*# cntl f
# trt f # cntl f # trt f *# cntl s
Ωω =
Ω
= =
= ÷ =
C2 Training Materials – Oslo – May 2009 www.campbellcollaboration.org
Example
5*12Odds ratio, o
6*14
600.71
84
=
= =
C2 Training Materials – Oslo – May 2009 www.campbellcollaboration.org
Outcomes of one study
dcComparison
b
Failure
aTreatment
SuccessFrequencies
C2 Training Materials – Oslo – May 2009 www.campbellcollaboration.org
Odds ratio, o or ESOR
OR
adES
bc=
C2 Training Materials – Oslo – May 2009 www.campbellcollaboration.org
Interpretation of ESOR
• ESOR = 1, Treatment & Control equally effective
• ESOR > 1, Treatment successes more likely than Control
successes
• 0 < ESOR < 1, Treatment successes less likely than Control
C2 Training Materials – Oslo – May 2009 www.campbellcollaboration.org
C2 Training Materials – Oslo – May 2009 www.campbellcollaboration.org
Information for a 2 x 2 table
• MST n = 92
• IT (Control) n = 84
• 26.1% of MST group re-arrested
• 71.4% of IT group re-arrested
C2 Training Materials – Oslo – May 2009 www.campbellcollaboration.org
2 x 2 Table
84 – 60 = 2471.4% of 84 =
60
IT
92 – 24 = 6826.1% of 92 =
24
MST
Not arrestedRe-arrested
C2 Training Materials – Oslo – May 2009 www.campbellcollaboration.org
Odds ratio
OR
24*24ES
68*60
0.14
=
=
C2 Training Materials – Oslo – May 2009 www.campbellcollaboration.org
Example
• Take Table 1 from the Ogden study. Compute the odds ratio
for the odds of children in MST who are in out-of-home
placement versus the odds of children in usual child services
who are in out-of-home placement
C2 Training Materials – Oslo – May 2009 www.campbellcollaboration.org
C2 Training Materials – Oslo – May 2009 www.campbellcollaboration.org
2 x 2 Table
41.9% of 37 =
15.5
58.1% of 37 = 21.5Usual child welfare
services
9.4% of 59 = 5.5590.6% of 59 = 53.45MST
Out of home
placement
In home placement
C2 Training Materials – Oslo – May 2009 www.campbellcollaboration.org
Why Do We Need to Interpret Effect Sizes?
• The importance of some intervention effects are sometimes intuitively understood
– Change in earning power
• “College graduates will earn $XX more in their lifetimes than non-graduates.”
– Risk ratio
• “…are 1.4 times more likely to …”
– Grade level equivalency
• “students receiving the intervention scored 5.3 GLE while students not receiving the intervention scored 4.9 GLE.”
• But, most are not …
– Statistically significant effect
– Correlation of +.35, d = -.15
• In most cases, we’ll be working with effects that have to be translated so people will have some idea how to interpret them
C2 Training Materials – Oslo – May 2009 www.campbellcollaboration.org
Options for Expressing Study Results in an Understandable
Metric
• Statistical significance
– Sometimes naively used as a proxy for effect size
• But trivially small effects can be statistically significant
• And large effects can be statistically nonsignificant
• Remember, a p-value expresses the likelihood of observing a
result at least this big, assuming a true null hypothesis
C2 Training Materials – Oslo – May 2009 www.campbellcollaboration.org
More on ES and Statistical Significance
• Some students learn that if a statistical test fails to
reject the null, it means that the population effect is
zero
– For example, that the intervention is ineffective
– This is one reason people confuse statistical significance
with practical significance (as in, if it is not statistically
significant it can’t be practically significant)
– However…
C2 Training Materials – Oslo – May 2009 www.campbellcollaboration.org
Point Estimation vs. Interval Estimation
• Interval estimation– Confidence intervals tell us the likely range of population values
• If a study has a confidence interval for IQ scores ranging from .1 to 10.1 points, that is the likely range of the treatment effect as suggested by this study
• Point estimation– Point estimates (e.g., the mean) tell us the most likely value of the
population parameter
Point estimation and interval estimation are best kept separate
Asserting that the treatment effect is zero if the test is not statistically significant confounds these two activities
C2 Training Materials – Oslo – May 2009 www.campbellcollaboration.org
Counternull Value of an Effect Size
• The counternull value of an effect size points out this
problem
– Assume a study finds d = +.30, p = .10
– Classic H0: Counternull H0:
There is exactly as much evidence supporting the “classic” null
hypothesis as there is the counternull hypothesis! (The ES
is not statistically different from either 0 or +.60)
1 2
1 2 0
Y Y or
Y Y
=
− =
1 2 .60Y Y− =
C2 Training Materials – Oslo – May 2009 www.campbellcollaboration.org
Proportion of Variance Explained
• Common for correlations (r2), multiple regression (R2)
• Research suggests that neither experienced researchers nor
experienced statisticians have a good feel for the practical
meaning of this type of effect size (Rosenthal, 1984)
– Typically, even well-trained individuals underestimate the importance
of results when stated in terms of proportion of variance explained
– Not to mention policy makers and the general public
C2 Training Materials – Oslo – May 2009 www.campbellcollaboration.org
More on Proportion of Variance Explained
• Consider a study
– Program designed to improve graduation rate among “at-risk”
students
– φ = +.32, φ2 = .10
• Remember, φ is a correlation with 2 dichotomous variables
– Using proportion of variance as the effect size, one might be tempted
to label this a small or even trivial effect, as only 10% of the variance
in graduation rates can be attributed to the intervention. But …
C2 Training Materials – Oslo – May 2009 www.campbellcollaboration.org
Binomial Effect Size Display
6634Control
3466Received Intervention
Did not
Graduate
Graduated
φ = .32
C2 Training Materials – Oslo – May 2009 www.campbellcollaboration.org
Physician’s Aspirin Study
φ=.08, φ2=.006, p = .16, OR=.48, Risk ratio
= .51
18171Placebo
599Aspirin
Fatality rates, given second heart
attack
φ=.03, φ2=.0009, p<.0001, OR=.55, Risk
ratio = .55 (55% fewer men who take aspirin
have a second heart attack)
18910,845Placebo
10410,933Aspirin
Heart AttackNo Heart AttackSubsequent heart attack rates
C2 Training Materials – Oslo – May 2009 www.campbellcollaboration.org
Computing the BESD
• For dichotomous outcomes, the BESD illustrates
change in “success rate” corresponding to particular
values of r
– For example, the number of additional graduates
• Computed as (simply)
Treatment group success rate = .50 + (r/2)
Control group success rate = .50 – (r/2)
C2 Training Materials – Oslo – May 2009 www.campbellcollaboration.org
Risk Ratios
• Defined as:
Events in the treatment group / treatment group n ÷Events in the control group / control group n
• Interpreted as “The ratio of risk in the treatment group relative to the risk in the control group”– Risk ratio for having a second heart attack was .55
• 55% fewer men who take aspirin have a second heart attack
C2 Training Materials – Oslo – May 2009 www.campbellcollaboration.org
Odds vs. Risk Ratios
• OR and RR are very similar
when events are rare
• When events become more
common, they diverge
– Study 1: OR = .40 RR = .401
– Study 2: OR = 1.25 RR = 1.50
• Generally, logged ORs have
somewhat better properties for
meta-analysis
– Can convert any OR to a RR for
interpretation
10005Control
10002Treatment
Non-eventEventStudy 1
Control
Treatment
Study 2
400
500
Event
600
500
Non-event
C2 Training Materials – Oslo – May 2009 www.campbellcollaboration.org
Risk Difference
• Interpreted as– The difference in risks
between two groups
• Defined as(a ÷ (a+b)) - (c ÷ (c+d))
104 ÷ (104+10933) -
189 ÷ (189+10845) =
.0094-.0171 = -.0077 (or .77%)10,845 (d)189 (c)Placebo
10,993 (b)104 (a)Aspirin
Heart AttackNo Heart
Attack
C2 Training Materials – Oslo – May 2009 www.campbellcollaboration.org
Number Needed to Treat
• Number needed to treat (NNT) is an additional way to interpret dichotomous outcomes
– How many people have to receive the intervention to produce one more positive (or, one less negative) event?
• Defined as
1/risk difference
• Here, NNT = 1/.0077 ≈ 130– So, 130 men who have had a heart attack need to take aspirin to prevent one
additional second heart attack
– With the fictitious program designed to increase graduation rates among “at-risk”students,
RD = .66-.34 = .32
NNT = 1/.32 = 3.125
– for every 3.125 people who participate in the program, an additional one person will graduate
C2 Training Materials – Oslo – May 2009 www.campbellcollaboration.org
Cohen’s Benchmarks
• Jacob Cohen (1988) proposed general definitions for
interpreting effect size estimates:
.10.20Small
.50.80Large
.30.50Medium
rd-index
C2 Training Materials – Oslo – May 2009 www.campbellcollaboration.org
More on Cohen
• Lipsey & Wilson (1993) analyzed 183 meta-analyses in the social sciences
– 25th percentile d = .25
– 50th percentile d = .38
– 75th percentile d = .62
• Cohen intended these to be “rules of thumb”, and emphasized that they represent average effects from across the social sciences
– Cautioned that in some areas, smallish effects may be more typical due to:
• Measurement error
• Relative weakness of interventions
– He did not intend these to stand for estimates of practical significance!
C2 Training Materials – Oslo – May 2009 www.campbellcollaboration.org
Converting Back to Original Metric
• It can sometimes be helpful to use the mean difference to translate back into a metric people are more accustomed to working with
– Example
• Assume we did a research synthesis and meta-analysis of the effects of homework on achievement among HS students. Outcomes included standardized test scores such as the SAT and ACT, and chapter tests. Assume overall result was d = +.20, and that type of outcome was not a moderator of effect sizes.
– SAT average = 500, SD = 100
– ACT average = 21, SD = 5
» “The overall effect suggests, for example, that the average student doing homework would see an increase in SAT scores from 500 to 520, or in ACT scores from 21 to 22.”
– Cautions
• Comparing different constructs (e.g., math achievement vs. attendance) is difficult to impossible
• Even when tests are highly similar, if their distributions are different the comparisons can be misleading
C2 Training Materials – Oslo – May 2009 www.campbellcollaboration.org
Basic Strategy for Comparing Effect Sizes
• Holding intervention constant, are there differential effects across outcomes?
– Does summer school help math more than reading?
• Holding outcome constant, are there differential effects across interventions (or intervention components)?
– Does mentoring affect graduation rates more than tutoring?
C2 Training Materials – Oslo – May 2009 www.campbellcollaboration.org
Other Considerations When Comparing Effect Sizes
• Are some important outcomes completely missing from the
evidence base?
• Are some interventions or intervention components missing
from the evidence base?
• Is there covariation between interventions and study
methodology?
• Is there covariation between interventions and outcome
choice?
– Caution about comparing different mediating variables