original dataoriginal data. (various) reformat datareformat data: structural issues draw sample...

31
Original d ata . (various) Reformat data : •structural issues •draw sample •confidentiali ty (general tools) Data dictionary . (txt/pdf) Enumeratio n forms and instructio ns. (pdf) Sample designs, census info, etc. (pdf) Collect metadata for inp ut variables : •codes •labels (original language) •labels (English) •frequencies (Excel, with Perl) Convert to editable files : •translate into English •standardized layout •standardized formatting (Word) Assemble codes, labels, and frequencies from source variables for harmonized trans tables. (automated) Collect relevant enumeration text for harmonized variables. (automated) Create files for public delivery. (pdf & generated HTML) Create translation tabl es : •recoding matrix (Excel) Variable descrip tions : •definition of variable •universe •comparability general & detailed (Word) Project-wide control files: •countries •samples variables (Excel) Create IPUMSI data: •creation: Java reporting: Java •testing: SPSS •extraction: Java IPUMSI web site. (Java & HTML) Export IPUMSI metadata for use by major MPC programs. (transfer responsibility to IT) Origina l materia ls Prepare samples Integration Create IPUMSI Collate sample informati on . (Word, tagged) Collect codes, labels, and frequencies for ALL input variables. (automated) Tag enumeration t ext to link it specifically with input variables. Create translation tables : •clean-up recoding only • virtually no special programming (Excel) Variable descriptions: •basic definition of variable •universe •cross references to enumeration text (Word) Create source variables Data improvements: •allocation •logical editing pointers Scripts for special programming (text)

Post on 21-Dec-2015

217 views

Category:

Documents


0 download

TRANSCRIPT

Original data.(various)

Reformat data:• structural issues• draw sample• confidentiality

(general tools)

Data dictionary.(txt/pdf)

Enumeration forms and instructions.(pdf)

Sample designs, census info, etc.(pdf)

Collect metadata for input variables:

• codes• labels (original

language)• labels (English)• frequencies

(Excel, with Perl)

Convert to editable files:

• translate into English

• standardized layout

• standardized formatting

(Word)

Assemble codes, labels, and frequencies from source variables for harmonized trans tables.(automated)

Collect relevant enumeration text for harmonized variables. (automated)

Create files for public delivery.(pdf & generated HTML)

Create translation tables:

• recoding matrix(Excel)

Variable descriptions:

• definition of variable

• universe• comparability• general &

detailed(Word)

Project-wide control files:

• countries• samples• variables

(Excel)

Create IPUMSI data:• creation: Java• reporting: Java• testing: SPSS• extraction: Java

IPUMSI web site.(Java & HTML)

Export IPUMSI metadata for use by major MPC programs.(transfer responsibility to IT)

Originalmaterials Prepare samples Integration Create IPUMSI

Collate sample information.(Word, tagged)

Collect codes, labels, and frequencies for ALL input variables.(automated)

Tag enumeration text to link it specifically with input variables.

Create translation tables:

• clean-up recoding only

• virtually no special programming

(Excel)

Variable descriptions:• basic definition of

variable• universe• cross references to

enumeration text(Word)

Create source variables

Data improvements:•allocation•logical editing•pointers

Scripts for special programming(text)

Integrated variable list.

Integrated variable description.

Sample designs, etc.

Enumeration files in their entirety.

Codes and labels, with frequencies.

Documentation:

User experience of IPUMSI web site

Source variable list.

Source variable description (Java assembles tagged enumeration text).

Translation table.

Special programming.

Source variable metadata: frequencies, labels, and original-language labels.

Select samples.

Download extract:• data• syntax• enhanced codebook

Data:Select variables:

• integrated• general or detailed• source

Select features:• case selection• household aggregation• attached characteristics

Registration:• more rigorous vetting• more automated registration processes

Access.• user preferences• registration expires (1 yr)

Registration:

Enumeration text specific to the variable.(assembled by Java)

Vice President of the U.S., 1856-1860

Secretary of War, C.S.A, 1861-1865

Later charged with treason, fled to Cuba

How a case gets from the manuscript census into the IPUMS

John C. Breckinridge of Kentucky

An example from the 1860 census....

Original enumeration form from the 1860 U.S. Census

Data entry screen in Minnesota (ca. 1997)

Household and person record ready for checking (ca. 1999)

Coding dictionary for the occupation variable (ca. 2000)

86510365530021011203411453142582488000000000000 86510365530022011202511036999995000000000000000 86510365530023011202611036159690379000000000000 86510365530024011203011453254591346000000000000 86510365530025011203011421167594308000000000000 86510365555020010103911021048250916030000003000 86510365555020020203321021999995000000000000000 86510365555020030301511021999995000000000000000 86510365555020040301311021999999000000000000000 86510365555020050301121021999999000000000000000 86510365555020060300911021999999000000000000000 86510365555020070300521021999999000000000000000 86510365567017010102411021083290516005000015000 86510365567017020201821021999995000000000000000 86510365567017030300311021999999000000000000000 86510365567017040703411021999995000001200006000

Year

Industry

Page WealthAge

Relationship

Checked and coded data, ready for release (ca. 2001)

Occupation

Enumeration form: original file

Variable labels file

Data file: before reformatting

Data file: after reformatting

geography housing

person (head)

person (child)

person (child)

geography housing person (head)

geography housing person (child)

geography housing person (child)

geography housing person (head)

geography housing person (spouse)

geography housing person (child)

geography housing person (child)

geography housing

person (head)

person (spouse)

person (child)

person (child)

Reformat Rectangular Sample

(Brazil 1980)

(Person records only; household data duplicated on person records)

Reformat Dwelling-Household-Person Sample

dwelling

household

person (head)

person (spouse)

person (child)

household

person (head)

person (child)

person (head)

person (spouse)

dwelling

household

dwelling household

person (head)

person (spouse)

person (child)

dwelling household

person (head)

person (child)

dwelling household

person (head)

person (spouse)

(Chile 1992)

(Separate dwelling and household records)

dwelling 001  

head    

spouse    

child    

head    

dwelling 002  

head    

child    

Reformat Dwelling-Person Sample

(Colombia 1993)

household 00101  

head    

spouse    

child    

household 00102  

head    

household 00201  

head    

child    

(Multi-household dwellings; no separate household record)

serial 001 head

serial 001 spouse

serial 002 head

serial 002 child

serial 003 head

serial 001 geog & housing

serial 002 geog & housing

serial 003 geog & housing

serial 001 household

serial 001 head

serial 001 spouse

serial 003 household

serial 002 household

serial 002 head

serial 002 child

serial 003 head

Household File

Person File

(Brazil 2000)

Merge Separate Household and Person Files

Reformat Individual-level Data

geog person housing geog person

geog person housing geog person

geog person housing geog person

geog person housing geog person

geog person housing geog person

person

household

household

person

person

person

person

household

household

household

(Mexico 1960)

geog person housing geog person

geog person housing geog person

geog person housing geog person

geog person housing geog person

geog person housing geog person

(Individuals only; not organized in households)

Enumeration form: editable file, in English

Variable description

Sample design

Pernum Relate Age Sex Marst Chborn

1 head 46 male married n/a

2 spouse 44 female married 3

3 aunt 77 female widow 7

4 child 15 female single 0

5 child 13 female single n/a

6 child 11 male single n/a

Pernum Relate Age Sex Marst Chborn

1 head 46 male married n/a

2 spouse 44 female married 3

3 aunt 77 female widow 7

4 child 15 female single 0

5 child 13 female single n/a

6 child 11 male single n/a

Spouse’s

Mother’s Father’s

IPUMS “Pointer” VariablesLocation

 

 

 

 

 

 

2

1

0

0

0

0

Location

 

 

 

 

 

 

Location

 

 

 

 

 

 

0

0

0 0

0

0

2 1

1

1

2

2

(Colombia 1985)

(Simple household)

Pernum Relationship Age Sex Marst Chborn

1 head 53 female separated 6

2 child 28 male single n/a

3 child 22 male single n/a

4 child 21 male single n/a

5 child 25 female married 2

6 child-in-law 28 male married n/a

7 grandchild 3 male single n/a

8 grandchild 1 male single n/a

9 non-relative 32 female separated 2

10 non-relative 10 male single n/a

11 non-relative 5 female single n/a

Location

 

 

 

 

 

 

 

 

 

 

 

Location

 

 

 

 

 

 

 

 

 

 

 

Location

 

 

 

 

 

 

 

 

 

 

 

0

0

0

0

0

6

5

0

0

0

0

0

0

1

1

1

1

0

5

5

0

9

9

0

0

0

6

6

0

0

0

0

0

Spouse’s Father’sMother’s

IPUMS “Pointer” Variables(Complex household)

(Colombia 1985)

Project control file: variables

Translation table

code label chn1982 col1973 ken1989 mex1970 usa1990P35..35 P17..17 P24..24 P46..46 P59..59

100 SINGLE/NEVER MARRIED 1=never married 4=single 1=Single 9=Soltero 6=Never married200 MARRIED/IN UNION210 Married (not specified) 2=married 2=married 2=Monogamous 1=married, sp present211 Civil 3=Only civil212 Religious 4=Only religious213 Civil and religious 2=Civil and religious214 Polygamous 3=Polygamous220 Consensual union 1=free union 5=Free union300 SEPARATED/DIVORCED310 Separated or divorced 3=separated or divorced320 Separated 6=Separated 8=Separated 3=separated321 Legally separated322 De facto separated330 Divorced 4=divorced 5=Divorced 7=Divorced 4=divorced340 Married, spouse absent 2=married, sp absent400 WIDOWED 3=widowed 5=widowed 4=Widowed 6=Widowed 5=widowed999 UNKNOWN/MISSING 0=missing 6=unknown B=Blank 1=Unknown

MARST Marital Status

Translation Matrix – Marital Status

How we integrate variables across countries and time

code label chn1982 col1973 ken1989 mex1970 usa1990P35..35 P17..17 P24..24 P46..46 P59..59

100 SINGLE/NEVER MARRIED 1=never married 4=single 1=Single 9=Soltero 6=Never married200 MARRIED/IN UNION210 Married (not specified) 2=married 2=married 2=Monogamous 1=married, sp present211 Civil 3=Only civil212 Religious 4=Only religious213 Civil and religious 2=Civil and religious214 Polygamous 3=Polygamous220 Consensual union 1=free union 5=Free union300 SEPARATED/DIVORCED310 Separated or divorced 3=separated or divorced320 Separated 6=Separated 8=Separated 3=separated321 Legally separated322 De facto separated330 Divorced 4=divorced 5=Divorced 7=Divorced 4=divorced340 Married, spouse absent 2=married, sp absent400 WIDOWED 3=widowed 5=widowed 4=Widowed 6=Widowed 5=widowed999 UNKNOWN/MISSING 0=missing 6=unknown B=Blank 1=Unknown

MARST Marital Status

Translation Matrix – Marital Status

location of data in the original samples

code label chn1982 col1973 ken1989 mex1970 usa1990P35..35 P17..17 P24..24 P46..46 P59..59

100 SINGLE/NEVER MARRIED 1=never married 4=single 1=Single 9=Soltero 6=Never married200 MARRIED/IN UNION210 Married (not specified) 2=married 2=married 2=Monogamous 1=married, sp present211 Civil 3=Only civil212 Religious 4=Only religious213 Civil and religious 2=Civil and religious214 Polygamous 3=Polygamous220 Consensual union 1=free union 5=Free union300 SEPARATED/DIVORCED310 Separated or divorced 3=separated or divorced320 Separated 6=Separated 8=Separated 3=separated321 Legally separated322 De facto separated330 Divorced 4=divorced 5=Divorced 7=Divorced 4=divorced340 Married, spouse absent 2=married, sp absent400 WIDOWED 3=widowed 5=widowed 4=Widowed 6=Widowed 5=widowed999 UNKNOWN/MISSING 0=missing 6=unknown B=Blank 1=Unknown

MARST Marital Status

Translation Matrix – Marital Status

marital codes used in the 1973 Colombian census

code label chn1982 col1973 ken1989 mex1970 usa1990P35..35 P17..17 P24..24 P46..46 P59..59

100 SINGLE/NEVER MARRIED 1=never married 4=single 1=Single 9=Soltero 6=Never married200 MARRIED/IN UNION210 Married (not specified) 2=married 2=married 2=Monogamous 1=married, sp present211 Civil 3=Only civil212 Religious 4=Only religious213 Civil and religious 2=Civil and religious214 Polygamous 3=Polygamous220 Consensual union 1=free union 5=Free union300 SEPARATED/DIVORCED310 Separated or divorced 3=separated or divorced320 Separated 6=Separated 8=Separated 3=separated321 Legally separated322 De facto separated330 Divorced 4=divorced 5=Divorced 7=Divorced 4=divorced340 Married, spouse absent 2=married, sp absent400 WIDOWED 3=widowed 5=widowed 4=Widowed 6=Widowed 5=widowed999 UNKNOWN/MISSING 0=missing 6=unknown B=Blank 1=Unknown

MARST Marital Status

Translation Matrix – Marital Status

different original codes for “widowed” across the censuses

code label chn1982 col1973 ken1989 mex1970 usa1990P35..35 P17..17 P24..24 P46..46 P59..59

100 SINGLE/NEVER MARRIED 1=never married 4=single 1=Single 9=Soltero 6=Never married200 MARRIED/IN UNION210 Married (not specified) 2=married 2=married 2=Monogamous 1=married, sp present211 Civil 3=Only civil212 Religious 4=Only religious213 Civil and religious 2=Civil and religious214 Polygamous 3=Polygamous220 Consensual union 1=free union 5=Free union300 SEPARATED/DIVORCED310 Separated or divorced 3=separated or divorced320 Separated 6=Separated 8=Separated 3=separated321 Legally separated322 De facto separated330 Divorced 4=divorced 5=Divorced 7=Divorced 4=divorced340 Married, spouse absent 2=married, sp absent400 WIDOWED 3=widowed 5=widowed 4=Widowed 6=Widowed 5=widowed999 UNKNOWN/MISSING 0=missing 6=unknown B=Blank 1=Unknown

MARST Marital Status

Translation Matrix – Marital Status

final IPUMS coding scheme for marital status

Source variable translation table

CR840018 label cos1984Marital status P

75

<tt>0 NIU B=Under age 101 Consensual union 1=Consensual union2 Married 2=Married3 Separated 3=Separated4 Divorced 4=Divorced5 Widowed 5=Widowed6 Single 6=Single9 Missing 0=[undocumented]9 " 8=[undocumented]

</tt>

Tagged enumeration form