![Page 1: CESS Workshop Oxford University June 2013 Don Green Professor of Political Science Columbia University](https://reader036.vdocument.in/reader036/viewer/2022081602/551b2e3355034607418b61f6/html5/thumbnails/1.jpg)
CESS Workshop
Oxford University
June 2013
Don Green
Professor of Political Science
Columbia University
![Page 2: CESS Workshop Oxford University June 2013 Don Green Professor of Political Science Columbia University](https://reader036.vdocument.in/reader036/viewer/2022081602/551b2e3355034607418b61f6/html5/thumbnails/2.jpg)
Outline
•Examples of ongoing field experiments•Why experiment? Why experiment in the field?•Key ideas in Field Experiments: Design, Analysis, and Interpretation
• Randomization inference• Noncompliance• Attrition• Spillover
![Page 3: CESS Workshop Oxford University June 2013 Don Green Professor of Political Science Columbia University](https://reader036.vdocument.in/reader036/viewer/2022081602/551b2e3355034607418b61f6/html5/thumbnails/3.jpg)
Brief sketches of some current experimental projects
• Voter turnout and persuasion in the US and abroad (mass media, mail, phones, shoe leather, events…) with a recent focus on the influence of social norms
• Downstream experiments: habit, education and political participation
• Prejudice reduction• Criminal sentencing and deterrence• Civic education and political attitudes• Media and pro-social behavior
![Page 4: CESS Workshop Oxford University June 2013 Don Green Professor of Political Science Columbia University](https://reader036.vdocument.in/reader036/viewer/2022081602/551b2e3355034607418b61f6/html5/thumbnails/4.jpg)
Why experiment?
The role of scientific procedures/design in making the case for unbiased inference
Unobserved confounders and lack of clear stopping rules in observational research
The value of transparent & intuitive science that involves ex ante design/planning and, in principle, limits the analyst’s discretion
Experiments provide benchmarks for evaluating other forms of research
![Page 5: CESS Workshop Oxford University June 2013 Don Green Professor of Political Science Columbia University](https://reader036.vdocument.in/reader036/viewer/2022081602/551b2e3355034607418b61f6/html5/thumbnails/5.jpg)
Why experiment in field settings?
Four considerations regarding generalizability Subjects, treatments, contexts, outcome
measures
Narrowing the gap between an experimental design and the target estimand
Trade off: must attend to treatment fidelity
Systematic experimental inquiry can lead to the discovery and development of useful and theoretically illuminating interventions
![Page 6: CESS Workshop Oxford University June 2013 Don Green Professor of Political Science Columbia University](https://reader036.vdocument.in/reader036/viewer/2022081602/551b2e3355034607418b61f6/html5/thumbnails/6.jpg)
What is an experiment?
Perfectly controlled experiments in the physical sciences versus randomized experiments in the behavioral sciences
A study that assigns subjects to treatment with a known probability between 0 and 1
Different types of random assignment: simple, complete, blocked, clustered
Confusion between random sampling and random assignment
![Page 7: CESS Workshop Oxford University June 2013 Don Green Professor of Political Science Columbia University](https://reader036.vdocument.in/reader036/viewer/2022081602/551b2e3355034607418b61f6/html5/thumbnails/7.jpg)
Experiment versus Alternative Research Designs: Example of Assessing the Effects of Election Monitors on Vote Fraud (inspired by the work of Susan Hyde)
Randomized experiment: Researcher randomly assigns election monitoring teams to polling locations
Natural/quasi experiment: Researcher compares polling locations visited by monitoring teams to polling locations not visited because some monitoring team leaders were sick and unable to travel
Observational Study: Researcher compares polling locations visited by monitoring teams to polling locations not visited
![Page 8: CESS Workshop Oxford University June 2013 Don Green Professor of Political Science Columbia University](https://reader036.vdocument.in/reader036/viewer/2022081602/551b2e3355034607418b61f6/html5/thumbnails/8.jpg)
Are experiments feasible?
Large-scale, government-funded “social experiments” and evaluations
Research collaborations with political campaigns, organizations, etc. who allocate resources
Researcher-driven/designed interventions
Seizing opportunities presented by naturally occurring randomized assignment
![Page 9: CESS Workshop Oxford University June 2013 Don Green Professor of Political Science Columbia University](https://reader036.vdocument.in/reader036/viewer/2022081602/551b2e3355034607418b61f6/html5/thumbnails/9.jpg)
Are experiments necessary?
Not when the counterfactual outcome is obvious (e.g., the effects of parachutes on the well-being of skydivers) By the way, how do we know it’s obvious?
Not when there is little or no heterogeneity among the units of observation (e.g., consumer product testing)
Not when the apparent effect is so large that there is no reason to attribute it to unobserved heterogeneity (e.g., the butterfly ballot effect in the 2000 election)
…for most behavioral science applications, experiments are indispensable
![Page 10: CESS Workshop Oxford University June 2013 Don Green Professor of Political Science Columbia University](https://reader036.vdocument.in/reader036/viewer/2022081602/551b2e3355034607418b61f6/html5/thumbnails/10.jpg)
Themes of FEDAI
Importance of… Defining the estimand
Appreciating core assumptions under which a given experimental design will recover the estimand
Conducting data analysis in a manner that follows logically from the randomization procedure
Following procedures that limit the role of discretion in data analysis
Presenting results in a detailed and transparent manner
![Page 11: CESS Workshop Oxford University June 2013 Don Green Professor of Political Science Columbia University](https://reader036.vdocument.in/reader036/viewer/2022081602/551b2e3355034607418b61f6/html5/thumbnails/11.jpg)
Potential outcomes
Potential outcomes: Yi(d) for d = {0,1}
Unit-level treatment effect: Yi(1) – Yi(0)
Average treatment effect: E[Yi(1) – Yi(0)]
Indicator for treatment received: di
Observed outcome: Yi =diYi(1) + (1-di)Yi(0)
![Page 12: CESS Workshop Oxford University June 2013 Don Green Professor of Political Science Columbia University](https://reader036.vdocument.in/reader036/viewer/2022081602/551b2e3355034607418b61f6/html5/thumbnails/12.jpg)
Example: a (hypothetical) schedule of potential outcomes
![Page 13: CESS Workshop Oxford University June 2013 Don Green Professor of Political Science Columbia University](https://reader036.vdocument.in/reader036/viewer/2022081602/551b2e3355034607418b61f6/html5/thumbnails/13.jpg)
Core assumptions
Random assignment of subjects to treatments implies that receiving the treatment is statistically
independent of subjects’ potential outcomes Non-interference: a subject’s potential outcomes reflect
only whether they receive the treatment themselves A subject’s potential outcomes are unaffected by how the
treatments happened to be allocated Excludability: a subject’s potential outcomes respond
only to the defined treatment, not other extraneous factors that may be correlated with treatment
Importance of defining the treatment precisely and maintaining symmetry between treatment and control groups (e.g., through blinding)
![Page 14: CESS Workshop Oxford University June 2013 Don Green Professor of Political Science Columbia University](https://reader036.vdocument.in/reader036/viewer/2022081602/551b2e3355034607418b61f6/html5/thumbnails/14.jpg)
Randomization inference
State a set of assumptions under which all potential outcomes become empirically observable (e.g., the sharp null hypothesis of no effect for any unit)
Define a test statistic: estimated it in the sample at hand and simulate its sampling distribution over all (or effectively all) possible random assignments
Conduct hypothesis testing in a manner that follows from the randomization procedure (e.g., blocking, clustering, restricted randomizations) using the ri() package in R.
Analogous procedures for confidence intervals
![Page 15: CESS Workshop Oxford University June 2013 Don Green Professor of Political Science Columbia University](https://reader036.vdocument.in/reader036/viewer/2022081602/551b2e3355034607418b61f6/html5/thumbnails/15.jpg)
Using the ri() package in R (Aronow & Samii 2012)
Uses observed data and a maintained hypothesis to impute a full schedule of potential outcomes
Detects complex designs and makes appropriate adjustments to estimators of the ATE (or CACE)
Avoids common data analysis errors related to blocking or clustering
Simulates the randomization distribution and calculates p-values and confidence intervals
Provides a unified framework for a wide array of tests and sidesteps distributional assumptions
![Page 16: CESS Workshop Oxford University June 2013 Don Green Professor of Political Science Columbia University](https://reader036.vdocument.in/reader036/viewer/2022081602/551b2e3355034607418b61f6/html5/thumbnails/16.jpg)
Example: RI versus t-tests as applied to a small contributions experiment (n=20)
One-tailed p-values for the estimated ATE of 70: randomization inference p=0.032 t(equal variances) p=0.082 t(unequal variances) p=0.091
![Page 17: CESS Workshop Oxford University June 2013 Don Green Professor of Political Science Columbia University](https://reader036.vdocument.in/reader036/viewer/2022081602/551b2e3355034607418b61f6/html5/thumbnails/17.jpg)
Example of a common error in the analysis of a block-randomized design
A GOTV phone-calling experiment conducted in two blocks: competitive and uncompetitive. We effectively have two experiments, one in each block.
Voted in Nov., 2002? * Called by Phone Bank? * Competitive House Race? Crosstabulation
657709 8540 666249
57.0% 56.9% 57.0%
496900 6460 503360
43.0% 43.1% 43.0%
1154609 15000 1169609
100.0% 100.0% 100.0%
166649 7888 174537
52.5% 52.6% 52.5%
150533 7112 157645
47.5% 47.4% 47.5%
317182 15000 332182
100.0% 100.0% 100.0%
Count
% within Calledby Phone Bank?
Count
% within Calledby Phone Bank?
Count
% within Calledby Phone Bank?
Count
% within Calledby Phone Bank?
Count
% within Calledby Phone Bank?
Count
% within Calledby Phone Bank?
Abstained
Voted
Voted in Nov.,2002?
Total
Abstained
Voted
Voted in Nov.,2002?
Total
CompetitiveHouse Race?No
Yes
Not called(control)
called(treatment)
Called by Phone Bank?
Total
![Page 18: CESS Workshop Oxford University June 2013 Don Green Professor of Political Science Columbia University](https://reader036.vdocument.in/reader036/viewer/2022081602/551b2e3355034607418b61f6/html5/thumbnails/18.jpg)
Doh! Notice what happens if you neglect to control for the blocks!
Statistically significant – and misleading – results…
Voted in Nov., 2002? * Called by Phone Bank? Crosstabulation
824358 16428 840786
56.0% 54.8% 56.0%
647433 13572 661005
44.0% 45.2% 44.0%
1471791 30000 1501791
100.0% 100.0% 100.0%
Count
% within Calledby Phone Bank?
Count
% within Calledby Phone Bank?
Count
% within Calledby Phone Bank?
Abstained
Voted
Voted in Nov.,2002?
Total
Not called(control)
called(treatment)
Called by Phone Bank?
Total
![Page 19: CESS Workshop Oxford University June 2013 Don Green Professor of Political Science Columbia University](https://reader036.vdocument.in/reader036/viewer/2022081602/551b2e3355034607418b61f6/html5/thumbnails/19.jpg)
Noncompliance: avoiding common errors
People you fail to treat are NOT part of the control group!
Do not throw out subjects who fail to comply with their assigned treatment
Base your estimation strategy on the ORIGINAL treatment and control groups, which were randomly assigned and therefore have comparable potential outcomes
![Page 20: CESS Workshop Oxford University June 2013 Don Green Professor of Political Science Columbia University](https://reader036.vdocument.in/reader036/viewer/2022081602/551b2e3355034607418b61f6/html5/thumbnails/20.jpg)
A misleading comparison: comparing groups according to the treatment they received rather than the treatment they were assigned
What if we had compared those contacted and those not contacted in the phone bank study? Voted in Nov., 2002? * Reached by Phone Bank? * Competitive House Race? Crosstabulation
663404 2845 666249
57.0% 51.6% 57.0%
500687 2673 503360
43.0% 48.4% 43.0%
1164091 5518 1169609
100.0% 100.0% 100.0%
171819 2718 174537
52.6% 46.9% 52.5%
154567 3078 157645
47.4% 53.1% 47.5%
326386 5796 332182
100.0% 100.0% 100.0%
Count
% within Reachedby Phone Bank?
Count
% within Reachedby Phone Bank?
Count
% within Reachedby Phone Bank?
Count
% within Reachedby Phone Bank?
Count
% within Reachedby Phone Bank?
Count
% within Reachedby Phone Bank?
Abstained
Voted
Voted in Nov.,2002?
Total
Abstained
Voted
Voted in Nov.,2002?
Total
CompetitiveHouse Race?No
Yes
No Yes
Reached by PhoneBank?
Total
![Page 21: CESS Workshop Oxford University June 2013 Don Green Professor of Political Science Columbia University](https://reader036.vdocument.in/reader036/viewer/2022081602/551b2e3355034607418b61f6/html5/thumbnails/21.jpg)
Addressing (one-sided) noncompliance statistically
Define “Compliers” and estimate the average treatment effect within this subgroup
Model the expected treatment and control group means as weighted averages of latent groups, “Compliers” and “Never-takers”
Assume excludability: assignment to treatment only affects outcomes insofar as it affects receipt of the treatment (the plausibility of this assumption varies by application)
![Page 22: CESS Workshop Oxford University June 2013 Don Green Professor of Political Science Columbia University](https://reader036.vdocument.in/reader036/viewer/2022081602/551b2e3355034607418b61f6/html5/thumbnails/22.jpg)
Simplified model notation
Suppose that the subject pool consists of two kinds of people: Compliers and Never-takers
Let Pc = the probability that Compliers vote
Let Pn = the probability that Never-takers vote
Let = the proportion of Compliers people in the subject pool
Let T = the average treatment effect among Compliers
![Page 23: CESS Workshop Oxford University June 2013 Don Green Professor of Political Science Columbia University](https://reader036.vdocument.in/reader036/viewer/2022081602/551b2e3355034607418b61f6/html5/thumbnails/23.jpg)
Expected voting rates in control and treatment groupsProbability of voting in the control group (V0) is a weighted average of Complier and Never-taker voting rates:
V0 = Pc + (1- ) Pn
Probability of voting in the treatment group (V1) is also a weighted average of Complier and Never-taker voting rates:
V1 = (Pc+ T) + (1- ) Pn
![Page 24: CESS Workshop Oxford University June 2013 Don Green Professor of Political Science Columbia University](https://reader036.vdocument.in/reader036/viewer/2022081602/551b2e3355034607418b61f6/html5/thumbnails/24.jpg)
Derive an Estimator for the Treatment-on-treated Effect (T)
V1 - V0 = (Pc+T)+(1- )Pn – { Pc + (1- ) Pn }
= T
T is the “intent to treat” effect
To estimate T, insert sample values into the formula:
T* = (V*1 – V*0)/where is the proportion of contacted people
(Compliers) observed in the treatment group, and V*1 and V*0 are the observed voting rates in the assigned treatment and control groups, respectively
![Page 25: CESS Workshop Oxford University June 2013 Don Green Professor of Political Science Columbia University](https://reader036.vdocument.in/reader036/viewer/2022081602/551b2e3355034607418b61f6/html5/thumbnails/25.jpg)
Example: Door-to-door canvassing (only) in New Haven, 1998: =.3208
![Page 26: CESS Workshop Oxford University June 2013 Don Green Professor of Political Science Columbia University](https://reader036.vdocument.in/reader036/viewer/2022081602/551b2e3355034607418b61f6/html5/thumbnails/26.jpg)
Example: V*1=.4626, V*0=.4220
![Page 27: CESS Workshop Oxford University June 2013 Don Green Professor of Political Science Columbia University](https://reader036.vdocument.in/reader036/viewer/2022081602/551b2e3355034607418b61f6/html5/thumbnails/27.jpg)
Estimate Actual Treatment Effect
( V*1– V*0 ) / = ( 46.26 – 42.20) / .3208 = 12.7
In other words: actual contact with canvassers increased turnout by 12.7 percentage-points
This estimator is equivalent to instrumental variables regression, where assignment to treatment is the instrument for actual contact.
Notice that we NEVER compare the voting rates of those who were contacted to those who were not contacted!…Why not?
![Page 28: CESS Workshop Oxford University June 2013 Don Green Professor of Political Science Columbia University](https://reader036.vdocument.in/reader036/viewer/2022081602/551b2e3355034607418b61f6/html5/thumbnails/28.jpg)
Design question: Compare the assigned treatment group to an untreated control group or a placebo group?
Placebo must be (1) ineffective and (2) administered in the same way as the treatment, such that assignment to placebo/treatment is random among those who are contacted Assignment to placebo should be blinded and
made at the last possible moment before treatment
Placebo design can generate more precise estimates when contact rates are low
![Page 29: CESS Workshop Oxford University June 2013 Don Green Professor of Political Science Columbia University](https://reader036.vdocument.in/reader036/viewer/2022081602/551b2e3355034607418b61f6/html5/thumbnails/29.jpg)
Nickerson’s (2005) Canvassing Experiment: GOTV, Placebo, and Control Groups
Contact rate was low (GOTV: 18.9% and placebo: 18.2%)
Turnout results for GOTV and control: GOTV treatment (N=2,572): 33.9% Control (N=2,572): 31.2% Treatment on treated: b=.144, SE=.069.
Turnout results for contacted GOTV and contacted placebo: GOTV treatment (N=486): 39.1% Placebo (N=470): 29.8% Treatment on treated: b=.093, SE=.031
Bottom line: placebo control led to more precise estimates
![Page 30: CESS Workshop Oxford University June 2013 Don Green Professor of Political Science Columbia University](https://reader036.vdocument.in/reader036/viewer/2022081602/551b2e3355034607418b61f6/html5/thumbnails/30.jpg)
Attrition
Can present a grave threat to any experiment because missing outcomes effectively un-randomize the assignment of subjects
If you confront attrition, consider whether it threatens the symmetry between assigned experimental groups
Consider design-based solutions such as an intensive effort to gather outcomes from a random sample of the missing
![Page 31: CESS Workshop Oxford University June 2013 Don Green Professor of Political Science Columbia University](https://reader036.vdocument.in/reader036/viewer/2022081602/551b2e3355034607418b61f6/html5/thumbnails/31.jpg)
Spillovers
Complication: equal-probability random assignment of units does not imply equal-probability assignment of exposure to spillovers
Unweighted difference-in-means or unweighted regression can give severely biased estimates
![Page 32: CESS Workshop Oxford University June 2013 Don Green Professor of Political Science Columbia University](https://reader036.vdocument.in/reader036/viewer/2022081602/551b2e3355034607418b61f6/html5/thumbnails/32.jpg)
Hypotheses about spillovers
Contagion: The effect of being vaccinated on one’s probability of contracting a disease depends on whether others have been vaccinated. Displacement: Police interventions designed to suppress crime in one location may displace criminal activity to nearby locations.
Communication: Interventions that convey information about commercial products, entertainment, or political causes may spread from individuals who receive the treatment to others who are nominally untreated. Social comparison: An intervention that offers housing assistance to a treatment group may change the way in which those in the control group evaluate their own housing conditions.
Persistence and memory: Within-subjects experiments, in which outcomes for a given unit are tracked over time, may involve “carryover” or “anticipation.”
![Page 33: CESS Workshop Oxford University June 2013 Don Green Professor of Political Science Columbia University](https://reader036.vdocument.in/reader036/viewer/2022081602/551b2e3355034607418b61f6/html5/thumbnails/33.jpg)
Example: Assessing the effects of lawn signs on a congressional candidate’s vote margin
Complication: the precinct in which a lawn sign is planted may not be the precinct in which those who see the lawn sign cast their votes
Exposure model: define a potential outcome for (1) precincts that receive signs, (2) precincts that are adjacent to precincts with signs, and (3) precincts that are neither treated nor adjacent to treated precincts
Further complication: precincts have different probabilities of assignment to the three conditions
![Page 34: CESS Workshop Oxford University June 2013 Don Green Professor of Political Science Columbia University](https://reader036.vdocument.in/reader036/viewer/2022081602/551b2e3355034607418b61f6/html5/thumbnails/34.jpg)
![Page 35: CESS Workshop Oxford University June 2013 Don Green Professor of Political Science Columbia University](https://reader036.vdocument.in/reader036/viewer/2022081602/551b2e3355034607418b61f6/html5/thumbnails/35.jpg)
![Page 36: CESS Workshop Oxford University June 2013 Don Green Professor of Political Science Columbia University](https://reader036.vdocument.in/reader036/viewer/2022081602/551b2e3355034607418b61f6/html5/thumbnails/36.jpg)
![Page 37: CESS Workshop Oxford University June 2013 Don Green Professor of Political Science Columbia University](https://reader036.vdocument.in/reader036/viewer/2022081602/551b2e3355034607418b61f6/html5/thumbnails/37.jpg)
![Page 38: CESS Workshop Oxford University June 2013 Don Green Professor of Political Science Columbia University](https://reader036.vdocument.in/reader036/viewer/2022081602/551b2e3355034607418b61f6/html5/thumbnails/38.jpg)
Non-interference: Summing up
Unless you specifically aim to study spillover, displacement, or contagion, design your study to minimize interference between subjects. Segregate your subjects temporally or spatially so that the assignment or treatment of one subject has no effect on another subject’s potential outcomes.
If you seek to estimate spillover effects, remember that you may need to use inverse probability weights to obtain consistent estimates
![Page 39: CESS Workshop Oxford University June 2013 Don Green Professor of Political Science Columbia University](https://reader036.vdocument.in/reader036/viewer/2022081602/551b2e3355034607418b61f6/html5/thumbnails/39.jpg)
Expect to perform a series of experiments.
In social science, one experiment is rarely sufficient to isolate a causal parameter.
Experimental findings should be viewed as provisional, pending replication, and even several replications and extensions only begin to suggest the full range of conditions under which a treatment effect may vary across subjects and contexts.
Every experiment raises new questions, and every experimental design can be improved in some way.