problems with the design and implementation of randomized experiments by larry v. hedges...
TRANSCRIPT
Problems with the Design and Implementation
of Randomized Experiments
ByLarry V. Hedges
Northwestern University
Presented at the 2009 IES Research Conference
Hard Answers to Easy Questions
ByLarry V. Hedges
Northwestern University
Presented at the 2009 IES Research Conference
Easy Question
Isn’t it ok if I just match (schools) on some variable before randomizing?
(You know lots of people do it)
This is a simple question, but giving it an answer requires serious thinking about design and analysis
What Does this Question Mean?
Generally adding matching or blocking variables means adding another (blocking) factor to the design
The exact consequences depend on the design you started with:
• Individually randomized (completely randomized design)
• Cluster randomized (hierarchical design)
• Multicenter or matched (randomized blocks design)
Individually Randomized (Completely Randomized) Design
In this case you are adding a blocking factor crossed with treatment (p blocks)
In other words, the design becomes a (generalized) randomized block design
Blocks
1 2 … p
T
C
Individually Randomized (Completely Randomized) Design
How does this impact the analysis?
Think about a balanced design with 2n students per block and p blocks and the ANOVA partitioning of sums of squares and degrees of freedom
Original partitioningSSTotal = SST + SSWT
dfTotal = dfT + dfWT 2pn – 1 = 1 + 2pn – 2
Original test statistic
F = SST/(SSWT/dfWT)
Individually Randomized (Completely Randomized) Design
New partitioningSSTotal = SST + SSB + SSBxT + SSWC dfTotal = dfT + dfB + dfBxT + dfWC 2pn – 1 = 1 + (p – 1) + (p – 1) + 2p(n – 1)
New test statistic ?
F = SST/(SSWC/dfWC)
Or
F = SST/(SSBxT/dfBxT)
It depends on the inference model
Individually Randomized (Completely Randomized) Design
Original Design Blocked Design
SS = SST + SSWT SS = SST + (SSB + SSBxT + SSWC)
df = dfT + dfWT df = dfT + (dfB + dfBxT + dfWC)
2pn–1 = 1 + (2pn –2) 2pn–1 = 1 + (p-1) + (p-1) + 2p(n-1)
Inference Models
I will mention two inference models• Conditional inference model• Unconditional inference model
These inference models determine the type of inference (generalization) you wish to make
Inference model chosen has implications for the statistical analysis procedure chosen
The inference model determines the natural random effects
Inference Models
Conditional Inference Model
Generalization is to the blocks actually in the experiment (or those just like them)
Blocks in the experiment are the universe (population)
Generalization to other blocks depends on extra-statistical considerations (which blocks are just like them? How do you know?)
Generalization obviously cannot be model free
Inference ModelsUnconditional Inference model
Generalization is to a universe (of blocks) including blocks not in the experiment
Blocks in the experiment are a sample of blocks in the universe (population)
If blocks in the experiment can be considered a representative sample, inference to the population of blocks is by sampling theory
If blocks are not a probability sample, generalization gets tricky (what is the universe? How do you know?)
Inference Models
You can think of the inference model as linked to the sampling model for blocks
If the blocks observed are a (random) sample of blocks, then they are a source of random variation
If blocks observed are the entire universe of relevant blocks, then they are not a source of random variation
The statistical analysis can be chosen independently of the inference model, but if it doesn’t include all sources of random variation, inferences will be compromised
Inference Models and Statistical AnalysesIndividually Randomized Design
Blocks are fixed effects under the conditional inference models
In this case the correct test statistic is
FC = SST/(SSWC/dfWC)
and the F-distribution has 1 & 2p(n -1) df
Block effects are random under the unconditional inference model
In this case the correct test statistic is
FU = SST/(SSBxT/dfBxT)
and the F-distribution has 1 & (p -1) df
Inference Models and Statistical AnalysesIndividually Randomized Design
You can see that the error term in the test has (a lot) more df under fixed effects model 2p(n – 1) versus (p – 1)
What you can’t see is that (if there is a treatment effect) the average value of the F-statistic is typically also larger under the fixed effects model
It is bigger by a factor proportional to
where ω = σBxT2/σB
2 is a treatment heterogeneity parameter and ρ is the intraclass correlation and
ρ nωρ
ρ
1 ρ ρ
Possible Statistical Analyses Individually Randomized Design
Possible statistical analyses
1. Ignore the blocking
2. Include blocks as fixed effects
3. Include blocks as random effects
Consequences depend on whether you want to make a conditional or unconditional inference
Making Unconditional Inferences Individually Randomized Design
Possible statistical analyses
1. Ignore the blockingBad idea: Will inflate significance levels of tests for
treatment effects substantially
2. Include blocks as fixed effectsBad idea: Will inflate significance levels of tests for
treatment effects substantially
3. Include blocks as random effectsCorrect significance levels (but less power than
conditional analysis)
Making Conditional Inferences Individually Randomized Design
Possible statistical analyses
1. Ignore the blockingBad idea: May deflate actual significance levels of tests
for treatment effects substantially (unless ρ = 0)
• Include blocks as fixed effectsCorrect significance levels and more powerful test than
for unconditional analysis
• Include blocks as random effectsBad idea: May deflate significance levels and reduce
power
Cluster Randomized (Hierarchical) Design
The issues about blocking in the cluster randomized design are the same as in the individually randomized design
The inference model will determine the most appropriate statistical analysis
Examining the properties of the statistical analysis may also reveal the weakness of the design for a given inference purpose
For example, a small number of blocks may provide only very uncertain inference to a universe of blocks based on sampling arguments
Cluster Randomized (Hierarchical) Design
In this case you are adding a blocking factor crossed with treatment (p blocks) but clusters are still nested within treatments [here Cij is the jth cluster in the ith block]
Note that there are m clusters in each treatment per block
Block 1 Block p
C11, …, C1m C1(m+1), …, C2m Cp1, …, Cpm Cp(m+1), …, Cp(2m)
T ---…
---
C --- ---
Cluster Randomized (Hierarchical) Design
How does this impact the analysis?
Think about a balanced design with 2mn students per block and p blocks and the ANOVA partitioning of sums of squares and degrees of freedom
Original partitioningSSTotal = SST + SSC + SSWC:T
dfTotal = dfT + dfC + dfWC:T 2mn – 1 = 1 + 2(m – 1) + 2m(n – 1)
Original test statistic
F = SST/(SSc/dfC)
Cluster Randomized (Hierarchical) Design
New partitioning
SSTotal = SST + SSB + SSBxT + SSC:BxT + SSWC
dfTotal = dfT + dfB + dfBxT + dfC:BxT + dfWC
2mpn – 1 = 1+ (p – 1) +(p – 1) +2p(m – 1) +2pm (n – 1)
New test statistic ?
F = SST/(SSWT/dfWT)
F = SST/(SSC:BxT/dfC:BxT)
Inference Models and Statistical Analyses Cluster Randomized Design
Blocks are fixed under the conditional inference model, but clusters are typically random
In this case the correct test statistic is
FC = SST/(SSC:BxT/dfC:BxT)
and the F-distribution has 1 & 2p(m – 1) df
Blocks are random under the unconditional inference model, but clusters are typically random
In this case there is no exact ANOVA test if there are block treatment interactions, but a conservative test uses the test statistic
FC = SST/(SSB/dfB)
and the F-distribution has 1 & (p – 1) df (large sample tests, e.g., based on HLM, are available)
Inference Models and Statistical Analyses Cluster Randomized Design
You can see that the error term has more df under fixed effects model
If there is a treatment effect the average value of the F-statistic is also larger under the fixed effects model
It is bigger by a factor proportional to
where ωB = σBxT2/σB
2 is a treatment heterogeneity parameter and ρB and ρC are the block and cluster level intraclass correlations, respectively and
ω +nB B C
C
ρ mn ρ ρ
ρ nρ
1 B Cρ ρ ρ
Possible Statistical AnalysesCluster Randomized Design
Possible statistical analyses
1. Ignore the blocking
2. Include blocks as fixed effects
3. Include blocks as random effects
Consequences depend on whether you want to make a conditional or unconditional inference
Making Unconditional InferencesCluster Randomized Design
Possible statistical analyses
1. Ignore the blockingBad idea: Will inflate significance levels of tests for
treatment effects substantially
2. Include blocks as fixed effectsBad idea: Will inflate significance levels of tests for
treatment effects substantially
3. Include blocks as random effectsCorrect significance levels but less power than
conditional analysis
Making Conditional InferencesCluster Randomized Design
Possible statistical analyses
1. Ignore the blockingBad idea: May deflate actual significance levels of tests
for treatment effects substantially
2. Include blocks as fixed effectsCorrect significance levels and more powerful test than
for unconditional analysis
3. Include blocks as random effectsNot such a bad idea: significance levels unaffected
Multi-center (Randomized Blocks) Design
The issues about blocking in the multicenter (randomized blocks) design are the same as in the cluster randomized design
The inference model will determine the most appropriate statistical analysis
Examining the properties of the statistical analysis may also reveal the weakness of the design for a given inference purpose
For example, a small number of blocks may provide only very uncertain inference to a universe of blocks based on sampling arguments
Multi-center (Randomized Blocks) Design
In this case you are adding a blocking factor crossed with treatment (p blocks) and clusters, but clusters are still nested within blocks [here Cij is the jth cluster in the ith block]
Note that there are m clusters in each treatment per block and n individuals in each treatment in each cluster
Block 1 Block p
C11 … C1m … Cp1 … Cpm
T …
…
…
C
Multi-center (Randomized Blocks) Design
How does this impact the analysis?
Think about a balanced design with 2mn students per block and p blocks n individuals per cell and the ANOVA partitioning of sums of squares and degrees of freedom
Original partitioningSSTotal = SST + SSC + SSTxC + SSWC
dfTotal = dfT + dfC + dfTxC + dfWC 2pmn – 1 = 1 + (pm – 1) + (pm – 1) + 2pm(n – 1)
Original test statistic
F = SST/(SSTxC/dfTxC)
Multi-center (Randomized Blocks) Design
New partitioning
SSTotal = SST + SSB + SSC:B + SSBxT + SSC:BxT + SSWC
dfTotal = dfT + dfB + dfC:B + dfBxT + dfC:BxT + dfWC
2mpn – 1 = 1+ (p – 1) + p(m – 1) + (p – 1) +2p(m – 1) +2pm (n – 1)
New test statistic ?
F = SST/(SSWC/dfWC)
F = SST/(SSBxT/dfBxT)
F = SST/(SSBxT/dfBxT)
Inference Models and Statistical Analyses Randomized Blocks Design
Blocks are fixed under the conditional inference models, but clusters are typically random
In this case the correct test statistic is
FC = SST/(SSC:BxT/dfC:BxT)
and the F-distribution has 1 & p(m – 1) df
Blocks are random under the unconditional inference model, but clusters are typically random
In this case the correct test statistic is
FU = SST/(SSBxT/dfBxT)
and the F-distribution has 1 & (p – 1) df
Inference Models and Statistical Analyses Randomized Blocks Design
You can see that the error term has more df under fixed effects model
If there is a treatment effect the average value of the F-statistic is also larger under the fixed effects model
It is bigger by a factor proportional to
where ωB = σBxT2/σB
2 and ωC = σCxT2/σC
2 are treatment heterogeneity parameters and ρB and ρC are the block and cluster level intraclass correlations, respectively and
ω +nB B C C
C C
ρ mn ρ ω ρ
ρ nω ρ
1 B Cρ ρ ρ
Possible Statistical AnalysesRandomized Blocks Design
Possible statistical analyses
1. Ignore the blocking
2. Include blocks as fixed effects
3. Include blocks as random effects
Consequences depend on whether you want to make a conditional or unconditional inference
Making Unconditional Inferences Randomized Blocks Design
Possible statistical analyses
1. Ignore the blockingBad idea: Will inflate significance levels of tests for
treatment effects substantially
2. Include blocks as fixed effectsBad idea: Will inflate significance levels of tests for
treatment effects substantially
3. Include blocks as random effectsCorrect significance levels but less power than
conditional analysis
Making Conditional Inference Randomized Blocks Design
Possible statistical analyses
1. Ignore the blockingBad idea: May deflate actual significance levels of tests
for treatment effects substantially
2. Include blocks as fixed effectsCorrect significance levels and more powerful test than
for unconditional analysis
3. Include blocks as random effectsBad idea: May deflate significance levels and reduce
power
Another Easy Question
There was some attrition from my study after assignment. Does that cause a serious problem?
This is another simple question, but the answer is far from simple. One answer can be framed using concepts of experimental design
Post Assignment Attrition
A different question has a simple answer:
Does that (attrition) cause a problem in principle?
The simple answer to that question is YES!
Randomized experiments with attrition no longer give model free, unbiased estimates of the causal effect of treatment
Whether the bias is serious or not depends (on the model that generates the missing data)
Post Assignment Attrition
The design is changed by adding a crossed factor corresponding to missingness like this
Now we can see a problem with estimating treatment effect from only the observed part of the design: The observed treatment effect is only part of the total treatment effect
Observed Missing
T
C
Post Assignment Attrition
Suppose that the means are given by the μ’s and the proportions are given by the π’s
Tπ
Cπ
Tπ
Cπ
Observed Missing
Proportion Mean Proportion Mean
T μTO μTM
C μCO μCM
Post Assignment Attrition
The treatment effect on all individuals randomized is
When the proportion of dropouts is equal in T and C so that
πT = πC = π
The mean of the treatment effect on all individuals randomized is
T TO T TM C CO C CMπ μ π μ π μ π μ
TO CO TM CMπ μ μ π μ μ
Post Assignment Attrition
Rewriting this we see that the average treatment effect for individuals assigned to treatment is
where δO is the treatment effect among the individuals that are observed and δM is the treatment effect among the individuals that are not observed and δ is the treatment effect among all individuals assigned
Thus bounds on δM imply bounds on δ
l
O Mδ π δ π δ
Post Assignment Attrition
No estimate of the treatment effect is possible without an estimate of the treatment effect among the missing individuals
One possibility is to model (assume) that we know something about the treatment effect in the missing individuals
We can assume a range of values to get bounds on the possible treatment effect
Post Assignment Attrition
When attrition rate is not the same in the treatment groups (πT ≠ πC) the analysis is trickier
One idea is to convince ourselves that the treatment effect for those who drop out is the same as those who do not
Observed Missing
Mean Mean
T 90 33
C 67 10
T-C 23 23
Post Assignment Attrition
This does not assure that attrition has not altered the treatment effect
l
Observed Missing
Mean Mean
T 90 33
C 67 10
T-C 23 23
Post Assignment Attrition
This does not assure that attrition has not altered the treatment effect
We have to know both μTM and μCM to identify the treatment effect, knowing δM = (μTM – μCM) is not enough
Observed Missing Total
n Mean n Mean n Mean
T 10 90 90 33 100 39
C 90 67 10 10 100 61
T-C 23 23 -23
Post Assignment Attrition
Suppose that BL
TM and BLCM are lower bounds on the means for
missing individuals in the treatment group and
BUTM and BU
CM are the upper bounds
Then the upper and lower bounds on the treatment effect are
Lower
Upper
T TO T TM C CO C CML Uπ μ π B π μ π B
T TO T TM C CO C CMU Lπ μ π B π μ π B
Post Assignment Attrition
Note that none of the results on attrition involve sampling or estimation error
Results get more complex if we take this into account, but the basic ideas are those here
Conclusions
Many simple questions arise in connection with field experiments
The answers to these questions often require thinking through complex aspects of
• the design
• the inference model
• assumptions about missing data
No correct answers are possible without recognizing these complexities