original dataoriginal data. (various) reformat datareformat data: structural issues draw sample...
Post on 21-Dec-2015
217 views
TRANSCRIPT
Original data.(various)
Reformat data:• structural issues• draw sample• confidentiality
(general tools)
Data dictionary.(txt/pdf)
Enumeration forms and instructions.(pdf)
Sample designs, census info, etc.(pdf)
Collect metadata for input variables:
• codes• labels (original
language)• labels (English)• frequencies
(Excel, with Perl)
Convert to editable files:
• translate into English
• standardized layout
• standardized formatting
(Word)
Assemble codes, labels, and frequencies from source variables for harmonized trans tables.(automated)
Collect relevant enumeration text for harmonized variables. (automated)
Create files for public delivery.(pdf & generated HTML)
Create translation tables:
• recoding matrix(Excel)
Variable descriptions:
• definition of variable
• universe• comparability• general &
detailed(Word)
Project-wide control files:
• countries• samples• variables
(Excel)
Create IPUMSI data:• creation: Java• reporting: Java• testing: SPSS• extraction: Java
IPUMSI web site.(Java & HTML)
Export IPUMSI metadata for use by major MPC programs.(transfer responsibility to IT)
Originalmaterials Prepare samples Integration Create IPUMSI
Collate sample information.(Word, tagged)
Collect codes, labels, and frequencies for ALL input variables.(automated)
Tag enumeration text to link it specifically with input variables.
Create translation tables:
• clean-up recoding only
• virtually no special programming
(Excel)
Variable descriptions:• basic definition of
variable• universe• cross references to
enumeration text(Word)
Create source variables
Data improvements:•allocation•logical editing•pointers
Scripts for special programming(text)
Integrated variable list.
Integrated variable description.
Sample designs, etc.
Enumeration files in their entirety.
Codes and labels, with frequencies.
Documentation:
User experience of IPUMSI web site
Source variable list.
Source variable description (Java assembles tagged enumeration text).
Translation table.
Special programming.
Source variable metadata: frequencies, labels, and original-language labels.
Select samples.
Download extract:• data• syntax• enhanced codebook
Data:Select variables:
• integrated• general or detailed• source
Select features:• case selection• household aggregation• attached characteristics
Registration:• more rigorous vetting• more automated registration processes
Access.• user preferences• registration expires (1 yr)
Registration:
Enumeration text specific to the variable.(assembled by Java)
Vice President of the U.S., 1856-1860
Secretary of War, C.S.A, 1861-1865
Later charged with treason, fled to Cuba
How a case gets from the manuscript census into the IPUMS
John C. Breckinridge of Kentucky
An example from the 1860 census....
86510365530021011203411453142582488000000000000 86510365530022011202511036999995000000000000000 86510365530023011202611036159690379000000000000 86510365530024011203011453254591346000000000000 86510365530025011203011421167594308000000000000 86510365555020010103911021048250916030000003000 86510365555020020203321021999995000000000000000 86510365555020030301511021999995000000000000000 86510365555020040301311021999999000000000000000 86510365555020050301121021999999000000000000000 86510365555020060300911021999999000000000000000 86510365555020070300521021999999000000000000000 86510365567017010102411021083290516005000015000 86510365567017020201821021999995000000000000000 86510365567017030300311021999999000000000000000 86510365567017040703411021999995000001200006000
Year
Industry
Page WealthAge
Relationship
Checked and coded data, ready for release (ca. 2001)
Occupation
geography housing
person (head)
person (child)
person (child)
geography housing person (head)
geography housing person (child)
geography housing person (child)
geography housing person (head)
geography housing person (spouse)
geography housing person (child)
geography housing person (child)
geography housing
person (head)
person (spouse)
person (child)
person (child)
Reformat Rectangular Sample
(Brazil 1980)
(Person records only; household data duplicated on person records)
Reformat Dwelling-Household-Person Sample
dwelling
household
person (head)
person (spouse)
person (child)
household
person (head)
person (child)
person (head)
person (spouse)
dwelling
household
dwelling household
person (head)
person (spouse)
person (child)
dwelling household
person (head)
person (child)
dwelling household
person (head)
person (spouse)
(Chile 1992)
(Separate dwelling and household records)
dwelling 001
head
spouse
child
head
dwelling 002
head
child
Reformat Dwelling-Person Sample
(Colombia 1993)
household 00101
head
spouse
child
household 00102
head
household 00201
head
child
(Multi-household dwellings; no separate household record)
serial 001 head
serial 001 spouse
serial 002 head
serial 002 child
serial 003 head
serial 001 geog & housing
serial 002 geog & housing
serial 003 geog & housing
serial 001 household
serial 001 head
serial 001 spouse
serial 003 household
serial 002 household
serial 002 head
serial 002 child
serial 003 head
Household File
Person File
(Brazil 2000)
Merge Separate Household and Person Files
Reformat Individual-level Data
geog person housing geog person
geog person housing geog person
geog person housing geog person
geog person housing geog person
geog person housing geog person
person
household
household
person
person
person
person
household
household
household
(Mexico 1960)
geog person housing geog person
geog person housing geog person
geog person housing geog person
geog person housing geog person
geog person housing geog person
(Individuals only; not organized in households)
Pernum Relate Age Sex Marst Chborn
1 head 46 male married n/a
2 spouse 44 female married 3
3 aunt 77 female widow 7
4 child 15 female single 0
5 child 13 female single n/a
6 child 11 male single n/a
Pernum Relate Age Sex Marst Chborn
1 head 46 male married n/a
2 spouse 44 female married 3
3 aunt 77 female widow 7
4 child 15 female single 0
5 child 13 female single n/a
6 child 11 male single n/a
Spouse’s
Mother’s Father’s
IPUMS “Pointer” VariablesLocation
2
1
0
0
0
0
Location
Location
0
0
0 0
0
0
2 1
1
1
2
2
(Colombia 1985)
(Simple household)
Pernum Relationship Age Sex Marst Chborn
1 head 53 female separated 6
2 child 28 male single n/a
3 child 22 male single n/a
4 child 21 male single n/a
5 child 25 female married 2
6 child-in-law 28 male married n/a
7 grandchild 3 male single n/a
8 grandchild 1 male single n/a
9 non-relative 32 female separated 2
10 non-relative 10 male single n/a
11 non-relative 5 female single n/a
Location
Location
Location
0
0
0
0
0
6
5
0
0
0
0
0
0
1
1
1
1
0
5
5
0
9
9
0
0
0
6
6
0
0
0
0
0
Spouse’s Father’sMother’s
IPUMS “Pointer” Variables(Complex household)
(Colombia 1985)
code label chn1982 col1973 ken1989 mex1970 usa1990P35..35 P17..17 P24..24 P46..46 P59..59
100 SINGLE/NEVER MARRIED 1=never married 4=single 1=Single 9=Soltero 6=Never married200 MARRIED/IN UNION210 Married (not specified) 2=married 2=married 2=Monogamous 1=married, sp present211 Civil 3=Only civil212 Religious 4=Only religious213 Civil and religious 2=Civil and religious214 Polygamous 3=Polygamous220 Consensual union 1=free union 5=Free union300 SEPARATED/DIVORCED310 Separated or divorced 3=separated or divorced320 Separated 6=Separated 8=Separated 3=separated321 Legally separated322 De facto separated330 Divorced 4=divorced 5=Divorced 7=Divorced 4=divorced340 Married, spouse absent 2=married, sp absent400 WIDOWED 3=widowed 5=widowed 4=Widowed 6=Widowed 5=widowed999 UNKNOWN/MISSING 0=missing 6=unknown B=Blank 1=Unknown
MARST Marital Status
Translation Matrix – Marital Status
How we integrate variables across countries and time
code label chn1982 col1973 ken1989 mex1970 usa1990P35..35 P17..17 P24..24 P46..46 P59..59
100 SINGLE/NEVER MARRIED 1=never married 4=single 1=Single 9=Soltero 6=Never married200 MARRIED/IN UNION210 Married (not specified) 2=married 2=married 2=Monogamous 1=married, sp present211 Civil 3=Only civil212 Religious 4=Only religious213 Civil and religious 2=Civil and religious214 Polygamous 3=Polygamous220 Consensual union 1=free union 5=Free union300 SEPARATED/DIVORCED310 Separated or divorced 3=separated or divorced320 Separated 6=Separated 8=Separated 3=separated321 Legally separated322 De facto separated330 Divorced 4=divorced 5=Divorced 7=Divorced 4=divorced340 Married, spouse absent 2=married, sp absent400 WIDOWED 3=widowed 5=widowed 4=Widowed 6=Widowed 5=widowed999 UNKNOWN/MISSING 0=missing 6=unknown B=Blank 1=Unknown
MARST Marital Status
Translation Matrix – Marital Status
location of data in the original samples
code label chn1982 col1973 ken1989 mex1970 usa1990P35..35 P17..17 P24..24 P46..46 P59..59
100 SINGLE/NEVER MARRIED 1=never married 4=single 1=Single 9=Soltero 6=Never married200 MARRIED/IN UNION210 Married (not specified) 2=married 2=married 2=Monogamous 1=married, sp present211 Civil 3=Only civil212 Religious 4=Only religious213 Civil and religious 2=Civil and religious214 Polygamous 3=Polygamous220 Consensual union 1=free union 5=Free union300 SEPARATED/DIVORCED310 Separated or divorced 3=separated or divorced320 Separated 6=Separated 8=Separated 3=separated321 Legally separated322 De facto separated330 Divorced 4=divorced 5=Divorced 7=Divorced 4=divorced340 Married, spouse absent 2=married, sp absent400 WIDOWED 3=widowed 5=widowed 4=Widowed 6=Widowed 5=widowed999 UNKNOWN/MISSING 0=missing 6=unknown B=Blank 1=Unknown
MARST Marital Status
Translation Matrix – Marital Status
marital codes used in the 1973 Colombian census
code label chn1982 col1973 ken1989 mex1970 usa1990P35..35 P17..17 P24..24 P46..46 P59..59
100 SINGLE/NEVER MARRIED 1=never married 4=single 1=Single 9=Soltero 6=Never married200 MARRIED/IN UNION210 Married (not specified) 2=married 2=married 2=Monogamous 1=married, sp present211 Civil 3=Only civil212 Religious 4=Only religious213 Civil and religious 2=Civil and religious214 Polygamous 3=Polygamous220 Consensual union 1=free union 5=Free union300 SEPARATED/DIVORCED310 Separated or divorced 3=separated or divorced320 Separated 6=Separated 8=Separated 3=separated321 Legally separated322 De facto separated330 Divorced 4=divorced 5=Divorced 7=Divorced 4=divorced340 Married, spouse absent 2=married, sp absent400 WIDOWED 3=widowed 5=widowed 4=Widowed 6=Widowed 5=widowed999 UNKNOWN/MISSING 0=missing 6=unknown B=Blank 1=Unknown
MARST Marital Status
Translation Matrix – Marital Status
different original codes for “widowed” across the censuses
code label chn1982 col1973 ken1989 mex1970 usa1990P35..35 P17..17 P24..24 P46..46 P59..59
100 SINGLE/NEVER MARRIED 1=never married 4=single 1=Single 9=Soltero 6=Never married200 MARRIED/IN UNION210 Married (not specified) 2=married 2=married 2=Monogamous 1=married, sp present211 Civil 3=Only civil212 Religious 4=Only religious213 Civil and religious 2=Civil and religious214 Polygamous 3=Polygamous220 Consensual union 1=free union 5=Free union300 SEPARATED/DIVORCED310 Separated or divorced 3=separated or divorced320 Separated 6=Separated 8=Separated 3=separated321 Legally separated322 De facto separated330 Divorced 4=divorced 5=Divorced 7=Divorced 4=divorced340 Married, spouse absent 2=married, sp absent400 WIDOWED 3=widowed 5=widowed 4=Widowed 6=Widowed 5=widowed999 UNKNOWN/MISSING 0=missing 6=unknown B=Blank 1=Unknown
MARST Marital Status
Translation Matrix – Marital Status
final IPUMS coding scheme for marital status
Source variable translation table
CR840018 label cos1984Marital status P
75
<tt>0 NIU B=Under age 101 Consensual union 1=Consensual union2 Married 2=Married3 Separated 3=Separated4 Divorced 4=Divorced5 Widowed 5=Widowed6 Single 6=Single9 Missing 0=[undocumented]9 " 8=[undocumented]
</tt>