propensity score matching a primer in r 1 david zepeda assistant professor supply chain &...
TRANSCRIPT
![Page 1: Propensity Score Matching A Primer in R 1 David Zepeda Assistant Professor Supply Chain & Information Management d.zepeda@neu.edu Center for Health Policy](https://reader035.vdocument.in/reader035/viewer/2022062407/56649d345503460f94a0ad0f/html5/thumbnails/1.jpg)
Propensity Score MatchingA Primer in R
1
David ZepedaAssistant Professor
Supply Chain & Information [email protected]
Center for Health Policy and Healthcare ResearchBrown Bag Series
April 1, 2015
![Page 2: Propensity Score Matching A Primer in R 1 David Zepeda Assistant Professor Supply Chain & Information Management d.zepeda@neu.edu Center for Health Policy](https://reader035.vdocument.in/reader035/viewer/2022062407/56649d345503460f94a0ad0f/html5/thumbnails/2.jpg)
Outline
1. Problem description2. Theory3. Two-Step Approach4. Implementation in R5. Example 1 – Hospitals6. Example 2 – Primary Care Clinics7. Example 3 – Farm Land 8. References
2
![Page 3: Propensity Score Matching A Primer in R 1 David Zepeda Assistant Professor Supply Chain & Information Management d.zepeda@neu.edu Center for Health Policy](https://reader035.vdocument.in/reader035/viewer/2022062407/56649d345503460f94a0ad0f/html5/thumbnails/3.jpg)
Problem
3
Consider a study with n observational units. There are two treatment states, 0 and 1.
The response of the ith unit is indicated by if the ith unit received treatment = 1 and if the ith unit received treatment = 0.
Mean causal effect =
![Page 4: Propensity Score Matching A Primer in R 1 David Zepeda Assistant Professor Supply Chain & Information Management d.zepeda@neu.edu Center for Health Policy](https://reader035.vdocument.in/reader035/viewer/2022062407/56649d345503460f94a0ad0f/html5/thumbnails/4.jpg)
Problem
4
The goal is to estimate the average effect over observations.
Let be a vector of the characteristics of the unit (i.e., covariates) that are not consequences of the treatment.
The mean causal effect averaged over all units is the average treatment effect (ATE).
ATE =
![Page 5: Propensity Score Matching A Primer in R 1 David Zepeda Assistant Professor Supply Chain & Information Management d.zepeda@neu.edu Center for Health Policy](https://reader035.vdocument.in/reader035/viewer/2022062407/56649d345503460f94a0ad0f/html5/thumbnails/5.jpg)
Problem
5
An observational unit is generally assigned only one of the two treatments.
The treatment is not randomly assigned.
Results in a number of potential problems regarding bias and model dependence.
![Page 6: Propensity Score Matching A Primer in R 1 David Zepeda Assistant Professor Supply Chain & Information Management d.zepeda@neu.edu Center for Health Policy](https://reader035.vdocument.in/reader035/viewer/2022062407/56649d345503460f94a0ad0f/html5/thumbnails/6.jpg)
Problem
6
Source: Ho, D. E., Imai, K., King, G. & Stuart, E.A. 2007. Matching as nonparametric preprocessing for reducing model dependence in parametric causal inference. Political Analysis, 15:199-236.
![Page 7: Propensity Score Matching A Primer in R 1 David Zepeda Assistant Professor Supply Chain & Information Management d.zepeda@neu.edu Center for Health Policy](https://reader035.vdocument.in/reader035/viewer/2022062407/56649d345503460f94a0ad0f/html5/thumbnails/7.jpg)
Let T be an indicator of treatment
Now compare a randomly selected treated unit to a randomly selected control unit.
Because conditioning on T, the difference between two randomly selected observational units is not in general the mean causal effect!
Theory
7
![Page 8: Propensity Score Matching A Primer in R 1 David Zepeda Assistant Professor Supply Chain & Information Management d.zepeda@neu.edu Center for Health Policy](https://reader035.vdocument.in/reader035/viewer/2022062407/56649d345503460f94a0ad0f/html5/thumbnails/8.jpg)
But, if treatment assignment is , then we have
Theory
8
![Page 9: Propensity Score Matching A Primer in R 1 David Zepeda Assistant Professor Supply Chain & Information Management d.zepeda@neu.edu Center for Health Policy](https://reader035.vdocument.in/reader035/viewer/2022062407/56649d345503460f94a0ad0f/html5/thumbnails/9.jpg)
Theory
9
A tool used to estimate treatment effects is a “balancing score”. Suppose is a function of covariates such that
such that
Sampling a treated and control unit with the same value of we have
![Page 10: Propensity Score Matching A Primer in R 1 David Zepeda Assistant Professor Supply Chain & Information Management d.zepeda@neu.edu Center for Health Policy](https://reader035.vdocument.in/reader035/viewer/2022062407/56649d345503460f94a0ad0f/html5/thumbnails/10.jpg)
Theory
10
A propensity score is p.
The true propensity score is a “balancing score” such that if treatment and control groups have identical propensity score distributions then all covariates are balanced between the two groups.
Therefore, we can use logistic regression or another approach (i.e., probit) to estimate p so we can get matched samples!
Can now use same parametric analysis on the matched dataset as would have been used to analyze the original raw dataset prior to propensity score matching.
![Page 11: Propensity Score Matching A Primer in R 1 David Zepeda Assistant Professor Supply Chain & Information Management d.zepeda@neu.edu Center for Health Policy](https://reader035.vdocument.in/reader035/viewer/2022062407/56649d345503460f94a0ad0f/html5/thumbnails/11.jpg)
1. Preprocessing via matchingDistributions of be matched as closely as possible (i.e., balance)Relationship between T and is eliminatedDo not introduce biasDo no increase inefficiency too muchCreate a dataset closer to an experiment
2. Parametric analysis after matching Holding covariates constantIncrease robustnessLess sensitive to modeling assumptionsA variety of treatment effects can be analyzed (i.e., ATE)
Two-Step Approach
11
![Page 12: Propensity Score Matching A Primer in R 1 David Zepeda Assistant Professor Supply Chain & Information Management d.zepeda@neu.edu Center for Health Policy](https://reader035.vdocument.in/reader035/viewer/2022062407/56649d345503460f94a0ad0f/html5/thumbnails/12.jpg)
What is R?A language and environment for statistical computing and
graphics Provides a wide variety of statistical and graphical techniquesIs highly extensible Provides an Open Source route to participation Great care has been taken over the defaults for the minor design
choices in graphicsUser retains full control Available as Free Software!Allows users to add additional functionalityCan be extended (easily) via packages.
The R Project for Statistical Computinghttp://www.r-project.org/
Implementation in R
12
![Page 13: Propensity Score Matching A Primer in R 1 David Zepeda Assistant Professor Supply Chain & Information Management d.zepeda@neu.edu Center for Health Policy](https://reader035.vdocument.in/reader035/viewer/2022062407/56649d345503460f94a0ad0f/html5/thumbnails/13.jpg)
MATCHIT PackageDichotomous treatment variableExperimental and observational dataImproving parametric statistical modelsReduces model dependenceSemi-parametric and non-parametric preprocessingAssess covariate distributions in the two groups (i.e., balance)
Large range of matching methodsExactSubclassificationNearest neighborOptimalGenetic
Implementation in R
13
![Page 14: Propensity Score Matching A Primer in R 1 David Zepeda Assistant Professor Supply Chain & Information Management d.zepeda@neu.edu Center for Health Policy](https://reader035.vdocument.in/reader035/viewer/2022062407/56649d345503460f94a0ad0f/html5/thumbnails/14.jpg)
Exact matchingSimplest version of matchingMatch each treated unit to all possible control unitsExactly the same values on all the covariatesSufficient matches often cannot be found
SubclassificationForms subclasses with “close” distributions of covariatesVarious subclassification schemesCan be used in conjunction with other matching methods
Nearest neighbor matchingSelects “best” control matches for each treated unitChooses the control unit not yet matched closest to treated unit
Implementation in R
14
![Page 15: Propensity Score Matching A Primer in R 1 David Zepeda Assistant Professor Supply Chain & Information Management d.zepeda@neu.edu Center for Health Policy](https://reader035.vdocument.in/reader035/viewer/2022062407/56649d345503460f94a0ad0f/html5/thumbnails/15.jpg)
Optimal matchingFinds matched samples with smallest average absolute distanceHelpful when there are not many appropriate control matches
Genetic matchingUses a genetic search algorithmOptimal balance achieved after matchingPerforms statistical tests for determining balance
Variety of options for matching methodsNumber of matched control unitsMatching with or without replacementKernel matchingDiscard treated units, control units, or bothNumber of subclassesDistance measurement (i.e., logit)
Implementation in R
15
![Page 16: Propensity Score Matching A Primer in R 1 David Zepeda Assistant Professor Supply Chain & Information Management d.zepeda@neu.edu Center for Health Policy](https://reader035.vdocument.in/reader035/viewer/2022062407/56649d345503460f94a0ad0f/html5/thumbnails/16.jpg)
Association between hospital system affiliation and hospital inventory in California hospitals (Zepeda, Nyaga, & Young, WP 2015)California hospital data from 2007 – 2009878 observations (126 affiliated with smaller hospital systems)Preprocessing of data on affiliation with smaller hospital systemsGenetic matching method2 control observations with replacement for every treated
observation126 observations in treatment group156 observations in control groupPropensity score balancing improved by 95%
Example 1
16
![Page 17: Propensity Score Matching A Primer in R 1 David Zepeda Assistant Professor Supply Chain & Information Management d.zepeda@neu.edu Center for Health Policy](https://reader035.vdocument.in/reader035/viewer/2022062407/56649d345503460f94a0ad0f/html5/thumbnails/17.jpg)
Example 1
17
![Page 18: Propensity Score Matching A Primer in R 1 David Zepeda Assistant Professor Supply Chain & Information Management d.zepeda@neu.edu Center for Health Policy](https://reader035.vdocument.in/reader035/viewer/2022062407/56649d345503460f94a0ad0f/html5/thumbnails/18.jpg)
Association between IT-leveraging capability and high quality diabetes care in Minnesota primary care clinics (Zepeda & Sinha, WP 2015)Minnesota primary care clinics in 2010450 observations (135 with high IT-leveraging capability)Preprocessing of data on high IT-leveraging capabilityOptimal matching method1 control observations without replacement for every treated
observation135 observations in treatment group135 observations in control groupPropensity score balancing improved by 76%
Example 2
18
![Page 19: Propensity Score Matching A Primer in R 1 David Zepeda Assistant Professor Supply Chain & Information Management d.zepeda@neu.edu Center for Health Policy](https://reader035.vdocument.in/reader035/viewer/2022062407/56649d345503460f94a0ad0f/html5/thumbnails/19.jpg)
Example 2
19
![Page 20: Propensity Score Matching A Primer in R 1 David Zepeda Assistant Professor Supply Chain & Information Management d.zepeda@neu.edu Center for Health Policy](https://reader035.vdocument.in/reader035/viewer/2022062407/56649d345503460f94a0ad0f/html5/thumbnails/20.jpg)
Effect of easements on the selling price of farms in Minnesota (Taff & Weisberg, 2007)Federal Conservation Reserve Program (CRP)Temporary conservation easement by USDA (10-15 years)Annual payment by USDA for enrolled landLand valuation theory predicts that temporary easements should
have no effect on value of properties
DataOct 1, 2002 – Sep 30, 2004Farm properties with short-term conservation easementsFarm properties with no conservation easementsCovariates2,937 property sales (271 were restricted by CRP contracts)
Example 3
20
![Page 21: Propensity Score Matching A Primer in R 1 David Zepeda Assistant Professor Supply Chain & Information Management d.zepeda@neu.edu Center for Health Policy](https://reader035.vdocument.in/reader035/viewer/2022062407/56649d345503460f94a0ad0f/html5/thumbnails/21.jpg)
The primary objectiveCompare 271 sales with CRP restrictions to sales without
Standard observational study approachUse all sales with no CRP as a comparison group
Potential problemProperties sold without a random assignmentDifferences between observable sample and target population
may be a cause for bias
Using propensity score matchingMimic a randomized experimentSample of non-CRP and CRP salesClosely agree on salient property characteristics (i.e., balance)
Example 3
21
![Page 22: Propensity Score Matching A Primer in R 1 David Zepeda Assistant Professor Supply Chain & Information Management d.zepeda@neu.edu Center for Health Policy](https://reader035.vdocument.in/reader035/viewer/2022062407/56649d345503460f94a0ad0f/html5/thumbnails/22.jpg)
Example 3
22
MediansUpper 75%Lower 25%Dotted lines = 95%
![Page 23: Propensity Score Matching A Primer in R 1 David Zepeda Assistant Professor Supply Chain & Information Management d.zepeda@neu.edu Center for Health Policy](https://reader035.vdocument.in/reader035/viewer/2022062407/56649d345503460f94a0ad0f/html5/thumbnails/23.jpg)
Six models developed and testedModels 1 – 3: use all data, CRP and portion of land RESTRICTEDModel 4: restricts data to sales with PRODUCTIVITY measureModel 5: matched sample on CRP restrictionModel 6: matched sample with PRODUCTIVITY measure
Consistency in resultsCRP contracts negatively associated with sales pricesMost of CRP effect is captured by RESTRICTED amountCounter to land valuation theory
Example 3
23
![Page 24: Propensity Score Matching A Primer in R 1 David Zepeda Assistant Professor Supply Chain & Information Management d.zepeda@neu.edu Center for Health Policy](https://reader035.vdocument.in/reader035/viewer/2022062407/56649d345503460f94a0ad0f/html5/thumbnails/24.jpg)
Example 3
24
![Page 25: Propensity Score Matching A Primer in R 1 David Zepeda Assistant Professor Supply Chain & Information Management d.zepeda@neu.edu Center for Health Policy](https://reader035.vdocument.in/reader035/viewer/2022062407/56649d345503460f94a0ad0f/html5/thumbnails/25.jpg)
The R Project for Statistical Computinghttp://www.r-project.org/
MATCHIT R Packagehttp://gking.harvard.edu/matchitHo, D. E., Imai, K., King, G. & Stuart, E.A. 2007. Matching as
nonparametric preprocessing for reducing model dependence in parametric causal inference. Political Analysis, 15:199-236.
ExamplesZepeda, D., Nyaga, G., & Young, G. 2015. Supply Chain Risk
Management and Hospital Inventory: Effects of System Affiliation. Working Paper.
Zepeda, D. & Sinha, K. IT-Leveraging Capability for Reducing Health Care Disparities: An Empirical Analysis of Primary Care Operations. Working Paper.
Taff, S.J. & Weisberg, S. 2007. Compensated short-term conservation restrictions may reduce sales prices. The Appraisal Journal, Winter.
References
25