A comparison of Random Effects meta-analysis methods when study effects are non-normally distributed
Evan Kontopantelis &David Reeves
NPCRDC
How meta-analysis works• A search for papers relevant to
the research question is conducted. Unsuitable papers are filtered out
• In each paper…– for each outcome measure that
is directly relevant to the RQ, or a good enough proxy, we calculate an effect (of intervention vs control) and its variance
– An overall effect and variance is selected
• Effects and their variances are combined to calculate an overall effect
Chronic disease - Risk factorseffect
-.4 0 .4 .8
Combined
Woolard(B), 1995
Woolard(A), 1995
Eckerlund, 1985
Moher, 2001
Cupples, 1994
Campbell, 1998
Van Ree, 1985
Heterogeneity
• Heterogeneity can be attributed to clinical and/or methodological diversity
• Clinical heterogeneity: variability that arises from different populations, interventions, outcomes and follow-up times
• Methodological heterogeneity: relates to differences in trial design and quality
• Detecting (usually with Cochran’s Q test) quantifying and dealing with heterogeneity can be very hard
Absence of heterogeneity
• Assumes that the true effects of the studies are all equal and deviations occur because of imprecision of results
• Analysed with fixed-effects method
i iY e
Presence of heterogeneity
• It is assumed that there exists variation in the size of the true effect among studies (in addition to the imprecision in results)
• Analysed with random-effects methods
i i iY e
Random-effect MA methods
• Estimate the between-study variance and use it in estimating the overall effect
• Parametric:– DerSimonian-Laird (1986)– Maximum & Profile likelihood (1996)
• Non-parametric:– Permutations method (1999)– Non Parametric Maximum Likelihood (1999)
2
“Potential” problems?
• Heterogeneity is common & the FE model is under fire
• Parametric RE models assume that both the effects and errors are normally distributed
• Almost all RE models (except PL) do not take account of uncertainty in
• DL is usually the preferred method of analysis because it is easy to implement and is available in all software packages
2
So far…
• The number of studies and the amount of heterogeneity have been found to affect method performance
• Performance comparisons usually focus on coverage and ignore power or have not included some important methods (e.g. PL, PE)
• Evaluations were based on normal data: method robustness has not been assessed with non-normal data
Our bit
In a nutshell
• Simulated various non-normal distributions for the true effects: skew normal, bimodal, beta, uniform, U and others
• Created datasets of 10000 meta-analyses for various numbers of studies k and different degrees of heterogeneity, for each distributional assumption
• Compared FE, DL, ML, PL and PE methods (along with a simple t-test) in terms of coverage and power across all datasets
Generating the data
• For a single study we simulated the effect size estimate and the within-study variance estimate of a binary outcome
• The variance was assumed to be a realisation from a distribution, multiplied by .25 and restricted to the (.009, .6) interval
• involves two components – where –
• Four values were used: .01, .03, .07 & .1• Number of studies (MA size) varied from 2 to 35
iYi2
21
iY i i iY e( )
2(0, ˆ )i ie 2
k
i i 2?(0, )i
Details on the MA methods
• Fixed effects (FE)• DerSimonian-Laird (DL)• Q method (Q)• Maximum Likelihood (ML)• Profile Likelihood (PL)• Permutations method (PE)• T-test method (T)
Performance
• For each simulated meta-analysis case we calculated confidence intervals for the overall effect estimate , for all the methods
• Coverage: % of confidence intervals that contain the true overall effect in a sample of 10000 meta-analyses
• Power: % of CIs that do not contain the 25th centile of the population distribution of the 10000 effect sizes
Results
Zero 2
Normally distributed i
Skew normal i
Bimodal i
PL performance across various distributions
Drawing conclusions
Summary
• Within any given method, the results were consistent across all types of distribution shape
• This can give researchers confidence that methods are highly robust against even the most severe violation of the assumption of normally distributed effect sizes
• If it is reasonable to assume that the effect size does not vary between studies, the FE, Q and ML methods all provide accurate coverage coupled with good power
In the presence of heterogeneity…
• However, zero between study variance is the exception rather than the norm and the presence of even a moderate amount of alters the picture considerably
• FE, Q and ML quickly lose coverage as heterogeneity increases
• DL rapidly goes from providing a coverage that is overly high, to one that is overly low
• PE, and to a lesser extend PL, now provide the best coverage, even with very small sample sizes
2
Which method then?
• If priority is given to maintaining an accurate Type I error rate then the simple t-test is the best method. But its power is very low, making it a poor choice when control of the Type II error rate is also important
• PE gives accurate coverage in all situations and has better power than T, but the method is more difficult to implement and cannot be used with less than 6 studies
• PL has ‘reasonable’ coverage in most situations, giving it an edge over other methods
Current & future work
• Created a freely available Excel add-in that implements all the described MA methods and various measures of heterogeneity
• Working on a STATA module that will do the same
• Investigate performance of heterogeneity measures under non-normally distributed data
Main references• Brockwell SE, Gordon IR. A comparison of statistical methods for
meta-analysis. Stat.Med. 2001; 20(6):825-840• Engels EA, Schmid CH, Terrin N, Olkin I, Lau J. Heterogeneity and
statistical significance in meta-analysis: an empirical study of 125 meta-analyses. Stat.Med. 2000; 19(13):1707-1728
• Follmann DA, Proschan MA. Valid inference in random effects meta-analysis. Biometrics 1999; 55(3):732-737
• Hardy RJ, Thompson SG. A likelihood approach to meta-analysis with random effects. Stat.Med. 1996; 15(6):619-629
• Micceri T. The Unicorn, the Normal Curve, and Other Improbable Creatures. Psychological Bulletin 1989; 105(1):156-166
• Ramberg JS, Dudewicz EJ, Tadikamalla PR, Mykytka EF. A Probability Distribution and Its Uses in Fitting Data. Technometrics 1979; 21(2):201-214
Thank you for listening