problems with the design and implementation of randomized experiments by larry v. hedges...

Problems with the Design and Implementation

of Randomized Experiments

ByLarry V. Hedges

Northwestern University

Presented at the 2009 IES Research Conference

Hard Answers to Easy Questions

ByLarry V. Hedges

Northwestern University

Presented at the 2009 IES Research Conference

Easy Question

Isn’t it ok if I just match (schools) on some variable before randomizing?

(You know lots of people do it)

This is a simple question, but giving it an answer requires serious thinking about design and analysis

What Does this Question Mean?

Generally adding matching or blocking variables means adding another (blocking) factor to the design

The exact consequences depend on the design you started with:

• Individually randomized (completely randomized design)

• Cluster randomized (hierarchical design)

• Multicenter or matched (randomized blocks design)

Individually Randomized (Completely Randomized) Design

In this case you are adding a blocking factor crossed with treatment (p blocks)

In other words, the design becomes a (generalized) randomized block design

Blocks

1 2 … p

T

C


How does this impact the analysis?

Think about a balanced design with 2n students per block and p blocks and the ANOVA partitioning of sums of squares and degrees of freedom

Original partitioningSSTotal = SST + SSWT

dfTotal = dfT + dfWT 2pn – 1 = 1 + 2pn – 2

Original test statistic

F = SST/(SSWT/dfWT)


New partitioningSSTotal = SST + SSB + SSBxT + SSWC dfTotal = dfT + dfB + dfBxT + dfWC 2pn – 1 = 1 + (p – 1) + (p – 1) + 2p(n – 1)

New test statistic ?

F = SST/(SSWC/dfWC)

Or

F = SST/(SSBxT/dfBxT)

It depends on the inference model


Original Design Blocked Design

SS = SST + SSWT SS = SST + (SSB + SSBxT + SSWC)

df = dfT + dfWT df = dfT + (dfB + dfBxT + dfWC)

2pn–1 = 1 + (2pn –2) 2pn–1 = 1 + (p-1) + (p-1) + 2p(n-1)

Inference Models

I will mention two inference models• Conditional inference model• Unconditional inference model

These inference models determine the type of inference (generalization) you wish to make

Inference model chosen has implications for the statistical analysis procedure chosen

The inference model determines the natural random effects

Inference Models

Conditional Inference Model

Generalization is to the blocks actually in the experiment (or those just like them)

Blocks in the experiment are the universe (population)

Generalization to other blocks depends on extra-statistical considerations (which blocks are just like them? How do you know?)

Generalization obviously cannot be model free

Inference ModelsUnconditional Inference model

Generalization is to a universe (of blocks) including blocks not in the experiment

Blocks in the experiment are a sample of blocks in the universe (population)

If blocks in the experiment can be considered a representative sample, inference to the population of blocks is by sampling theory

If blocks are not a probability sample, generalization gets tricky (what is the universe? How do you know?)

Inference Models

You can think of the inference model as linked to the sampling model for blocks

If the blocks observed are a (random) sample of blocks, then they are a source of random variation

If blocks observed are the entire universe of relevant blocks, then they are not a source of random variation

The statistical analysis can be chosen independently of the inference model, but if it doesn’t include all sources of random variation, inferences will be compromised

Inference Models and Statistical AnalysesIndividually Randomized Design

Blocks are fixed effects under the conditional inference models

In this case the correct test statistic is

FC = SST/(SSWC/dfWC)

and the F-distribution has 1 & 2p(n -1) df

Block effects are random under the unconditional inference model


FU = SST/(SSBxT/dfBxT)

and the F-distribution has 1 & (p -1) df

Inference Models and Statistical AnalysesIndividually Randomized Design

You can see that the error term in the test has (a lot) more df under fixed effects model 2p(n – 1) versus (p – 1)

What you can’t see is that (if there is a treatment effect) the average value of the F-statistic is typically also larger under the fixed effects model

It is bigger by a factor proportional to

where ω = σBxT2/σB

2 is a treatment heterogeneity parameter and ρ is the intraclass correlation and

ρ nωρ

ρ

1 ρ ρ

Possible Statistical Analyses Individually Randomized Design

Possible statistical analyses

1. Ignore the blocking

2. Include blocks as fixed effects

3. Include blocks as random effects

Consequences depend on whether you want to make a conditional or unconditional inference

Making Unconditional Inferences Individually Randomized Design


1. Ignore the blockingBad idea: Will inflate significance levels of tests for

treatment effects substantially

2. Include blocks as fixed effectsBad idea: Will inflate significance levels of tests for


3. Include blocks as random effectsCorrect significance levels (but less power than

conditional analysis)

Making Conditional Inferences Individually Randomized Design


1. Ignore the blockingBad idea: May deflate actual significance levels of tests

for treatment effects substantially (unless ρ = 0)

• Include blocks as fixed effectsCorrect significance levels and more powerful test than

for unconditional analysis

• Include blocks as random effectsBad idea: May deflate significance levels and reduce

power

Cluster Randomized (Hierarchical) Design

The issues about blocking in the cluster randomized design are the same as in the individually randomized design

The inference model will determine the most appropriate statistical analysis

Examining the properties of the statistical analysis may also reveal the weakness of the design for a given inference purpose

For example, a small number of blocks may provide only very uncertain inference to a universe of blocks based on sampling arguments


In this case you are adding a blocking factor crossed with treatment (p blocks) but clusters are still nested within treatments [here Cij is the jth cluster in the ith block]

Note that there are m clusters in each treatment per block

Block 1 Block p

C11, …, C1m C1(m+1), …, C2m Cp1, …, Cpm Cp(m+1), …, Cp(2m)

T ---…

---

C --- ---



Think about a balanced design with 2mn students per block and p blocks and the ANOVA partitioning of sums of squares and degrees of freedom

Original partitioningSSTotal = SST + SSC + SSWC:T

dfTotal = dfT + dfC + dfWC:T 2mn – 1 = 1 + 2(m – 1) + 2m(n – 1)


F = SST/(SSc/dfC)


New partitioning

SSTotal = SST + SSB + SSBxT + SSC:BxT + SSWC

dfTotal = dfT + dfB + dfBxT + dfC:BxT + dfWC

2mpn – 1 = 1+ (p – 1) +(p – 1) +2p(m – 1) +2pm (n – 1)


F = SST/(SSWT/dfWT)

F = SST/(SSC:BxT/dfC:BxT)

Inference Models and Statistical Analyses Cluster Randomized Design

Blocks are fixed under the conditional inference model, but clusters are typically random


FC = SST/(SSC:BxT/dfC:BxT)

and the F-distribution has 1 & 2p(m – 1) df

Blocks are random under the unconditional inference model, but clusters are typically random

In this case there is no exact ANOVA test if there are block treatment interactions, but a conservative test uses the test statistic

FC = SST/(SSB/dfB)

and the F-distribution has 1 & (p – 1) df (large sample tests, e.g., based on HLM, are available)

Inference Models and Statistical Analyses Cluster Randomized Design

You can see that the error term has more df under fixed effects model

If there is a treatment effect the average value of the F-statistic is also larger under the fixed effects model


where ωB = σBxT2/σB

2 is a treatment heterogeneity parameter and ρB and ρC are the block and cluster level intraclass correlations, respectively and

ω +nB B C

C

ρ mn ρ ρ

ρ nρ

1 B Cρ ρ ρ

Possible Statistical AnalysesCluster Randomized Design






Making Unconditional InferencesCluster Randomized Design






3. Include blocks as random effectsCorrect significance levels but less power than

conditional analysis

Making Conditional InferencesCluster Randomized Design



for treatment effects substantially

2. Include blocks as fixed effectsCorrect significance levels and more powerful test than


3. Include blocks as random effectsNot such a bad idea: significance levels unaffected

Multi-center (Randomized Blocks) Design

The issues about blocking in the multicenter (randomized blocks) design are the same as in the cluster randomized design

The inference model will determine the most appropriate statistical analysis

Examining the properties of the statistical analysis may also reveal the weakness of the design for a given inference purpose

For example, a small number of blocks may provide only very uncertain inference to a universe of blocks based on sampling arguments


In this case you are adding a blocking factor crossed with treatment (p blocks) and clusters, but clusters are still nested within blocks [here Cij is the jth cluster in the ith block]

Note that there are m clusters in each treatment per block and n individuals in each treatment in each cluster

Block 1 Block p

C11 … C1m … Cp1 … Cpm

T …

…

…

C



Think about a balanced design with 2mn students per block and p blocks n individuals per cell and the ANOVA partitioning of sums of squares and degrees of freedom

Original partitioningSSTotal = SST + SSC + SSTxC + SSWC

dfTotal = dfT + dfC + dfTxC + dfWC 2pmn – 1 = 1 + (pm – 1) + (pm – 1) + 2pm(n – 1)


F = SST/(SSTxC/dfTxC)


New partitioning

SSTotal = SST + SSB + SSC:B + SSBxT + SSC:BxT + SSWC

dfTotal = dfT + dfB + dfC:B + dfBxT + dfC:BxT + dfWC

2mpn – 1 = 1+ (p – 1) + p(m – 1) + (p – 1) +2p(m – 1) +2pm (n – 1)


F = SST/(SSWC/dfWC)



Inference Models and Statistical Analyses Randomized Blocks Design

Blocks are fixed under the conditional inference models, but clusters are typically random


FC = SST/(SSC:BxT/dfC:BxT)

and the F-distribution has 1 & p(m – 1) df

Blocks are random under the unconditional inference model, but clusters are typically random


FU = SST/(SSBxT/dfBxT)

and the F-distribution has 1 & (p – 1) df

Inference Models and Statistical Analyses Randomized Blocks Design

You can see that the error term has more df under fixed effects model

If there is a treatment effect the average value of the F-statistic is also larger under the fixed effects model


where ωB = σBxT2/σB

2 and ωC = σCxT2/σC

2 are treatment heterogeneity parameters and ρB and ρC are the block and cluster level intraclass correlations, respectively and

ω +nB B C C

C C

ρ mn ρ ω ρ

ρ nω ρ

1 B Cρ ρ ρ

Possible Statistical AnalysesRandomized Blocks Design






Making Unconditional Inferences Randomized Blocks Design






3. Include blocks as random effectsCorrect significance levels but less power than

conditional analysis

Making Conditional Inference Randomized Blocks Design



for treatment effects substantially

2. Include blocks as fixed effectsCorrect significance levels and more powerful test than


3. Include blocks as random effectsBad idea: May deflate significance levels and reduce

power

Another Easy Question

There was some attrition from my study after assignment. Does that cause a serious problem?

This is another simple question, but the answer is far from simple. One answer can be framed using concepts of experimental design

Post Assignment Attrition

A different question has a simple answer:

Does that (attrition) cause a problem in principle?

The simple answer to that question is YES!

Randomized experiments with attrition no longer give model free, unbiased estimates of the causal effect of treatment

Whether the bias is serious or not depends (on the model that generates the missing data)


The design is changed by adding a crossed factor corresponding to missingness like this

Now we can see a problem with estimating treatment effect from only the observed part of the design: The observed treatment effect is only part of the total treatment effect

Observed Missing

T

C


Suppose that the means are given by the μ’s and the proportions are given by the π’s

Tπ

Cπ

Tπ

Cπ

Observed Missing

Proportion Mean Proportion Mean

T μTO μTM

C μCO μCM


The treatment effect on all individuals randomized is

When the proportion of dropouts is equal in T and C so that

πT = πC = π

The mean of the treatment effect on all individuals randomized is

T TO T TM C CO C CMπ μ π μ π μ π μ

TO CO TM CMπ μ μ π μ μ


Rewriting this we see that the average treatment effect for individuals assigned to treatment is

where δO is the treatment effect among the individuals that are observed and δM is the treatment effect among the individuals that are not observed and δ is the treatment effect among all individuals assigned

Thus bounds on δM imply bounds on δ

l

O Mδ π δ π δ


No estimate of the treatment effect is possible without an estimate of the treatment effect among the missing individuals

One possibility is to model (assume) that we know something about the treatment effect in the missing individuals

We can assume a range of values to get bounds on the possible treatment effect


When attrition rate is not the same in the treatment groups (πT ≠ πC) the analysis is trickier

One idea is to convince ourselves that the treatment effect for those who drop out is the same as those who do not

Observed Missing

Mean Mean

T 90 33

C 67 10

T-C 23 23


This does not assure that attrition has not altered the treatment effect

l

Observed Missing

Mean Mean

T 90 33

C 67 10

T-C 23 23


This does not assure that attrition has not altered the treatment effect

We have to know both μTM and μCM to identify the treatment effect, knowing δM = (μTM – μCM) is not enough

Observed Missing Total

n Mean n Mean n Mean

T 10 90 90 33 100 39

C 90 67 10 10 100 61

T-C 23 23 -23


Suppose that BL

TM and BLCM are lower bounds on the means for

missing individuals in the treatment group and

BUTM and BU

CM are the upper bounds

Then the upper and lower bounds on the treatment effect are

Lower

Upper

T TO T TM C CO C CML Uπ μ π B π μ π B

T TO T TM C CO C CMU Lπ μ π B π μ π B


Note that none of the results on attrition involve sampling or estimation error

Results get more complex if we take this into account, but the basic ideas are those here

Conclusions

Many simple questions arise in connection with field experiments

The answers to these questions often require thinking through complex aspects of

• the design

• the inference model

• assumptions about missing data

No correct answers are possible without recognizing these complexities

problems with the design and implementation of randomized experiments by larry v. hedges...

Documents