meta-analysis of single-case designs

Post on 18-Jan-2018

221 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Overview of Talk Why Meta-Analysis of SCDs Past Effect Sizes and Their Limits Computing a d-statistic in the same metric as that used for group experiments How to compute power How to do a meta-analysis

TRANSCRIPT

Meta-Analysis of Single-Case Designs

William R. ShadishUniversity of California, Merced

Borrowing liberally from coauthored work with Larry Hedges, David Rindskopf, James Pustejovsky, Kristynn Sullivan, Jonathan Boyajian and Eden Nagler

The research reported here was supported by a grant from the University of California Office of the President to the University of California Educational Evaluation Consortium, and by the Institute of Education Sciences, U.S. Department of Education, Grant R305D100046 to the University of California Merced. The opinions expressed are those of the author and do not represent views of the Institute or the U.S. Department of Education.

Overview of Talk• Why Meta-Analysis of SCDs• Past Effect Sizes and Their Limits• Computing a d-statistic in the same metric as

that used for group experiments• How to compute power• How to do a meta-analysis

Why Meta-Analysis of SCDs

• Evidence-Based Practice– WWC Handbook says EBP Reviews should take “into

consideration the number of studies, the sample sizes, and the magnitude and statistical significance of the estimates of effectiveness”

– And also that “to adequately assess the effects of an intervention, it is important to know the statistical significance of the estimates of the effects in addition to the mean difference, effect size, or improvement index”

• To increase statistical power compared to analyses of single studies.

Past Effect Sizes• Many have been proposed

• Versions of d

• Varieties of overlap statistics (e.g., PAND).

• These are very important, especially to measure the within-case effect.

Limitations of Past Effect Sizes

• Usually standardized using within case rather than between case variability (unlike between groups d)– So not comparable to and cannot be combined with

between groups d.• Rarely take into account trend, within- versus

between case variability, autocorrelation• Not well-developed statistically– They lack distribution theory, standard errors unclear– Without valid standard errors, much of modern meta-

analysis cannot be applied– Ditto for power analysis

A d-statistic for SCDs That Begins to Remedy These Limitations

• Equivalent to between-groups d• Takes into account

– Autocorrelation– Ratio of between/total (between + within) variance– Number of data points in each phase– Number of cases in each study

• Corrects for small sample bias• However, it does not fix all limitations. Still assumes:

– continuous outcomes, – absence of trends, – fixed treatment effect across cases within studies

• And requires three cases on the same outcome to compute.• Two versions (so far):

Two Versions

• A version for ABk designs– Takes into account the number (k) of repetitions of the AB

pair

• A version for multiple baseline designs– Takes into account the different start times for treatment in

each case.

• Manual and SPSS macros at www.faculty.edu/wshadish– Including for power analysis

Number of Intervals of Disruptive Behavior Recorded during single-student responding (SSR—baseline) and response card treatment (RC—treatment) conditions. Lambert et al. 2006

ABAB Example

Results

• G = -2.513, V{G} = 0.041• Significance test• 95% Confidence Interval: • Estimated autocorrelation = 0.225• Estimated ICC r = .03 (ratio of between to

total variance)

2.51 .041 12.49z

[2.51 (1.96* .041)] [2.51 (1.96* .041)] 2.120 2.91

Multiple Baseline Example: Saddler et al. 2008 Treatment to Improve Writing

G = 1.963, V{G} = 0.335 = 0.010, ρ = 0.633.

PowerABk Designs MB Designs

.8

.5

Assume ICC and AC both = .5

Meta-Analysis Across SCD Studies: PRT

• Pivotal Response Training (PRT) for Autism– 7 studies (12 effect sizes) – with at least 3 cases (66 cases total), – with outcomes about verbalization.

Snapshot of the Data

Computer Programs

• We use the R package metafor– Syntax driven but easy (see Shadish et al. JSP)– Very comprehensive and cutting edge

• Possible to use Stata or SAS (or others). • Comprehensive Meta-Analysis is good• SPSS is, unfortunately, terrible.

Multivariate Data

• Means some studies have more than one dv.– That violates the independence assumption

• Best way to deal with is– Multivariate meta-analysis (metafor)– Robust variance estimation (Tipton robumeta)– Both require knowledge of correlation among dvs

• Simplest way is to average and VarG within study– Produces results that are not optimal but usually very

good unless the effect size is small or variance is large

Forest Plot: (Ordered by Precision)

Fixed vs Random Effects• Basic principle of both: Give more weight to

studies that measure with less error• Fixed effects. – w = 1/VarG – Generalization: Same studies but with different

people• Random effects– w = 1/(VarG + t2) – Generalize to other studies.

• Consensus is to use random effects

Overall Random Effects Results

2

2

1.01, 0.14, .001

( 0.06, 0.09,

40

9.1

.

0, 6, .168)

87%

g SE p

se Q df

I

pt

Cumulative Forest Plot

Diagnostic Tests

• Various Influence Statistics• Radial Plots• Normal Quantile-Quantile Plots

• It is not important for now that you understand these statistics– Read Shadish et al. JSP for understanding– Just look at them to get the general idea—looking for

studies that influence results more than others

Influence StatisticsNotice Study 4 (Schreibman et al. 2009) consistently an outlier in these tests.

This is a substantive issue not a statistical one. Why is it an outlier? Does it have special characteristics?

For example, Schreibman et al. (2009) treated the youngest children with autism. Does PRT work less well with younger children?

Radial PlotHelps identify which studies contribute to heterogeneity—those that fall outside the gray area.

Recall heterogeneity was not significant, so not surprisingly, none of the studies fall outside that area.

If one did fall outside, the task again is to figure out why.

Quintile-Quintile PlotA way to test for normality of the effect sizes.

All dots (studies) should fall within the 95% confidence intervals (dotted lines) and they do.

For studies that are outside the CI, again, explore why.

Publication Bias

• Studies with significant results more likely to be published (Rosenthal)

• Omitting the unpublished studies may overestimate the effect (the file drawer problem)

• But does that apply to SCDs since they don’t do significance tests?

Publication Bias in SCDs

• Mahoney (1977) sent SCD reviewers randomly assigned manuscripts varying visual positive, negative, or mixed results.

• Manuscripts with positive results were more likely to be published.

• Why would this bias exist in SCDs? – Traditional need for visually large effects?– Downplaying results that are not visually large?

Publication Bias Tests

• Begg and Mazumdar’s rank correlation test• Egger’s regression test • Funnel plot• Trim-and-fill• Selection bias models• All require homogeneous data– Except selection bias models– Fortunately the PRT data are homogenous

Two Statistical Tests

• Begg and Mazumdar’s rank correlation test – Computes the correlation between effect size and

study precision• Should be zero if no publication bias exists• r = .29 (p = .36), suggesting no bias

• Egger’s regression test– predicts [G/SE] from study precision, – can be more powerful than the rank correlation

test• intercept should be zero in the absence of bias• intercept is 3.31 (df = 5, p = .021), suggesting the

presence of bias

Funnel PlotPlots effect size against standard error.

Plot should be symmetric in the absence of publication bias (Why?)

HINT: Where are all the low precision studies showing small effects?

But can we quantify this judgment?

Trim-and-FillIdentifies “missing” studies and fills them in (the white dots).

Recomputes the meta-analysis using them.

Doing so, g drops from 1.01 to 0.77 (se = 0.16, p < .001).

Smaller but still significant.

Moderator Analyses• If data were heterogenous, then the effect

sizes differ by more than we expect by chance. • Can we predict that extra variation?

Results of a Regression Analysis

Mixed-Effects Model (k = 6; tau^2 estimator: REML) tau^2 (estimate of residual amount of heterogeneity): 0 (SE = 0.131) tau (sqrt of the estimate of residual heterogeneity): 0 Test for Residual Heterogeneity: QE(df = 2) = 0.676, p-val = 0.713 Test of Moderators (coefficient(s) 2,3,4): F(df1 = 3, df2 = 2) = 5.778, p-val = 0.151 Model Results: estimate se tval pval ci.lb ci.ub intrcpt 0.328 0.464 0.708 0.552 -1.667 2.324 Age 0.138 0.041 3.368 0.078 -0.038 0.315 . Sex 0.930 0.656 1.417 0.292 -1.894 3.755 Campus -0.200 0.308 -0.649 0.583 -1.526 1.126 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

None of the predictors are significant here, individually or as a whole

Not surprising because• Effect sizes were

homogenous• Small number of

studies so lower power

Each year of age adds 0.138 (p = .078) to g. Effect of PRT larger for older cases.

Summary• Need to make further developments for – Alternating treatment designs– Changing criterion designs– Studies that combine different designs within one

case, e.g,. Multiple baseline combined with alternating treatments.

– Different outcome metrics than normality– Better taking trend into account (we assume no

trend)• We are working on these things.• Still, this work offers a large number of new

statistical opportunities for SCD research

top related