nearest neighbor matching
Post on 22-Feb-2016
54 Views
Preview:
DESCRIPTION
TRANSCRIPT
Nearest neighbor matchingUSING THE GREEDY MATCH MACRO
Note: Much of the code originally was written by Lori Parsonshttp://www2.sas.com/proceedings/sugi26/p214-26.pdf
This code has been written with simplicity as a primary concern. If you do not have a large number of controls, you may want to modify it
/* Define the library for formats */
LIBNAME saslib "G:\oldpeople\sasdata\" ;
OPTIONS NOFMTERR FMTSEARCH = (saslib) ;
/* Define the library for study data */
LIBNAME study "C:\Users\AnnMaria\Documents\shrug\" ;
Include the Macro
%INCLUDE 'C:\Users\AnnMaria\Documents\shrug\nearestmacro.sas' ;
%propen(libname, dsname, idvariable, dependent, propensity)
LIBNAME = directory for data setsDSNAME = dataset with study dataIDVARIABLE = subject ID variableDEPENDENT = dependent variablePROPENSITY = propensity score produced in logistic regression
%propen(study,allpropen,id,athome,prob);
FOR EXAMPLE
Remember, we already have the study.allpropen dataset with the propensity score (prob) from the PROC LOGISTIC we just did
Explaining the macroA Challenge
%macro propen(lib,dsn,id,depend,prob);
Data in5 ;set &lib..&dsn ;
Creates a temporary data set
Propensity scores rounded to 5, then 4, 2, 3 and 1 decimals
%Do countr = 1 %to 5 ;%let digits = %eval(6 - &countr) ;%let roundto = %eval(10**&digits) ;%let roundto = %sysevalf(1/&roundto) ;%let nextin = %eval(&digits - 1) ;
MACRO NOTES
%Do countr = 1 %to 5 ;/* Starts %DO loop */
Use %EVAL function to do integer arithmetic
%let digits = %eval(6 - &countr) ;
Use %SYSEVALF function to do non-integers
/* Output control to one data set, intervention to another */
/* Create random number to sort within group */
Create 2 data sets
DATA yes1 (KEEP= &prob id_y depend_y randnum) no1 (KEEP = &prob id_n depend_n randnum ) ;SET in&digits ;
We go through this loop 5 times and create data sets of records matching to 5, 4, 3, 2 and 1 decimal placesWe only keep four variables
Assignment statements
randnum = RANUNI(0) ;&prob = ROUND(&prob,&roundto) ;
Create a random number andRound propensity score to a set
number of digits
Output to Case Data set …IF &depend = 1 THEN DO ;
id_y = &id ;depend_y = &depend ;OUTPUT yes1 ;END ;
We need to rename the dependent & id variables or they’ll get overwritten
… Or output control data set
ELSE IF &depend = 0 THEN DO ;
id_n = &id ;depend_n = &depend ;OUTPUT no1 ;
END ;
Notice the data sets were named no1 and yes1It becomes evident why shortly
/* Runs through control and experimental and matches up to 20 subjects with identical propensity score */
%Do i = 1 %to 20 ;
%let j = %eval(&i +1) ;proc sort data = yes&i ;
by &prob randnum ;data yes&i yes&j ;
set yes&i ;by &prob ;if first.&prob then output yes&i ;
else output yes&j ;
NOTE: Matching without replacement
Same thing for controlsproc sort data = no&i ;
by &prob randnum ;data no&i no&j ;
set no&i ;by &prob ;if first.&prob then output no&i ;
else output no&j ;
The randnum insures matching scores are pulled at random
Merge matches, end loopDATA match&i ;
MERGE yes&i (in= ina) no&i (in= inb) ;BY &prob ;IF ina AND inb ;
run ;%END ;
/* Adds all matches into a single data set */
DATA allmatches ;
SET%DO k = 1 %TO 20 ; match&k %END ;
Concatenate all data sets with matches (N=20)
Create two data sets with IDs
DATA allyes (RENAME = (id_y = &id depend_y
= &depend))
allno (RENAME = (id_n = &id depend_n = &depend));
SET allmatches ;
Create one file of all matched IDsDATA matchfile ;
SET allyes allno ;
And sort it …
proc sort data = matchfile ;by &id &depend ;
proc sort data = in&digits ;by &id &depend ;
DATA MATCHES&DIGITS IN&NEXTIN ;MERGE IN&DIGITS (IN = INA)
MATCHFILE (IN= INB) ;BY &ID &DEPEND ;IF INA AND INB THEN OUTPUT
MATCHES&DIGITS ;ELSE OUTPUT IN&NEXTIN ;
/* Creates a data set of all subjects with n-digit match *//* Creates a second data set of subjects with no match */
TITLE "MATCHES &ROUNDTO " ;PROC FREQ DATA = MATCHES&DIGITS ;
TABLES &DEPEND ;RUN ;%END ;
JUST A GOOD HABIT TO CHECK AS THE LOOP RUNS THROUGH
End loop. Now match to 4 decimal places, etc
/* Adds 1- to 5-digit matches into a single data set */
data &lib..finalset ;
set%do m = 1 %to 5 ; matches&m %end ;
One final check & done !Title "Distribution of Dependent
Variable in &lib..finalset " ;proc freq data = &lib..finalset ;
tables &depend ;run;%mend propen; run ;
Did it work?Variable
QUINTILES NEAREST NEIGHBOR
AT Home
NOT Home
Prob AT Home NOT Home
Prob
Age 79.2 79.3 .60 79.1 79.1 .76ER visits
4.5 ****
3.8 ****
.0001 4.2 4.2 .88
Female 52% 54% .36 50% 50% .74Race .97 .67
** P <.01 **** P < .0001
Model Comparison
TESTWithout
MatchingQuintile
MatchingNearest
NeighborLikelihood Ratio
643.1 180.8 186.6
Score 582.4 176.0 181.4Wald 485.6 165.7 170.4
Odds ratio
No Match Quintiles Nearest Neighbor
.154 .281 .269
6.5 : 1 3.6: 1 3.7 : 1
How near?Decimals # Matches
5 9024 143 1432 1011 38
top related