merging
TRANSCRIPT
INDIA HUMAN DEVELOPMENT SURVEY (IHDS) TRAINING PROGRAM MARCH 16, 2016
How to merge two rounds?
Merging Household Files
Relationship between IHDS-I and IHDS-II households
IHDS-I sample(N=41,554)
Replacement households in
IHDS-II (N=2,134)
Split households from round 1
(N=5,397)
Reinterview Households (N=34,621)
Attrition (N=6,911)
Most important concept in merging two data files
1. Some households in round 1 with no match in round 2 and vice versa
2. Households in round 1 match with more than 1 household in round 2
Any questions? Who were chosen for reinterview? Recontact rate of 83%? What does it
mean? How were replacement households
chosen? What is a split household?
What is needed to merge household files?1. Round 1 household file – N=41,5542. Round 2 household file – N=42,152
(Why are there more cases in round 2?)3. Linking file – N=42,152 – gives Round
1 identification codes for all Round 2 households that were reinterviewed, missing linking codes for 2,134 households that are new
Step 1 – Link round 2 data to linking file to get round 1 ID use linkhh, clear sort STATEID DISTID PSUID HHID
HHSPLITID merge 1:1 STATEID DISTID PSUID
HHID HHSPLITID using round2HH sort STATEID DISTID PSUID HHID2005
HHSPLITID2005, gen(_mergeR2link) save round2HH_plus, replace
Step 2-Merge this Round 2+ file with Round 1 file use round1HH rename HHID HHID2005 rename HHSPLITID HHSPLITID2005 sort STATEID DISTID PSUID HHID2005
HHSPLITID2005 merge 1:m STATEID DISTID PSUID HHID2005
HHSPLITID2005 using round2HH_plus, gen(_mergeR1R2)
sort STATEID DISTID PSUID HHID HHSPLITID save mergedHHR1R2, replace
Cases in Merged file is superset Households surveyed in both rounds
N=40,018 Households surveyed in round 1 only
(attrition) N=6,911 Households surveyd in round 2 only
(replacement) N=2,134
Total N=49,063 Keep only _mergeR1R2==3 for panel
analysis (N=40,018)
Merging Individual Files
Relationship between IHDS-I and IHDS-II individuals
IHDS-I sample
(N=215,754)
New individulas,
new HH (N=9,760)
New Ind in R1 HH
(N=43,822)
Reinterview Ind
(N=150,995)
HH attrition (N=29,299)
Ind. attrition in interview
hh (N=35,464)
Most important concept in merging two data files
1. Even reinterview households have new members (births, marriages)
2. Even reinterview households have some members who are no longer there (deaths, marriages, migration)
What is needed to merge individual files?1. Round 1 household file – N=215,7542. Round 2 household file – N=204,568
(Why are there more cases in round 2?)3. Linking file – N=204,568 – gives Round
1 identification codes for all Round 2 households that were reinterviewed, missing linking codes for 2,134 households that are new
Step 1 – Link round 2 data to linking file to get round 1 ID use linkind, clear sort STATEID DISTID PSUID HHID
HHSPLITID PERSONID merge 1:1 STATEID DISTID PSUID
HHID HHSPLITID PERONID using round2IND
sort STATEID DISTID PSUID HHID2005 HHSPLITID2005, gen(_mergeR2link)
save round2IND_plus, replace
Step 2-Merge this Round 2+ file with Round 1 file use round1IND rename HHID HHID2005 rename HHSPLITID HHSPLITID2005 rename PERSONID PERSONID2005 sort STATEID DISTID PSUID HHID2005
HHSPLITID2005 PERSONID2005 merge 1:m STATEID DISTID PSUID HHID2005
HHSPLITID2005 PERSONID2005 using round2IND_plus, gen(_mergeR1R2)
sort STATEID DISTID PSUID HHID HHSPLITID save mergedINDR1R2, replace
Cases in Merged file is superset Individuals surveyed in both rounds
N=150,988 Individuals surveyed in round 1 only
(attrition/death/migration) N=64,766 Individuals surveyd in round 2 only
(replacement/new) N=53,580
Total N=269,334 Keep only _mergeR1R2==3 for panel
analysis (N=150,988)
Evermarried woman file linkage
Same process as individual file linkage But only one thing to note, there was no
ever married woman file for 2004-5 so you will be merging with the household file from 2004-5
Merging Caution
Merging overwrites variables So if you want to keep variables from
round 1 and round 2 separate, before merging you may want to rename all round 1 variables
Typically we use the command Rename * x* Rename xSTATEID STATEID et. For
merging So xr05 will be age in 20045 and r05
will be age in 2011-12