integrated public use microdata series ipums matt sobek minnesota population center...

36
Integrated Public Use Microdata Series IPUMS www.ipums.org www.ipums.org Matt Sobek Minnesota Population Center [email protected]

Upload: hillary-morton

Post on 14-Jan-2016

221 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Integrated Public Use Microdata Series IPUMS Matt Sobek Minnesota Population Center sobek@pop.umn.edu

Integrated Public Use Microdata Series

IPUMS

www.ipums.orgwww.ipums.org

Matt SobekMinnesota Population Center

[email protected]

Page 2: Integrated Public Use Microdata Series IPUMS Matt Sobek Minnesota Population Center sobek@pop.umn.edu

IPUMS Overview

1. What is the IPUMS1. What is the IPUMS

2. Harmonization2. Harmonization

3. Additional Data Enhancements3. Additional Data Enhancements

4. Access4. Access

5. Strengths and Limitations5. Strengths and Limitations

6. Research examples6. Research examples

1. What is the IPUMS1. What is the IPUMS

Page 3: Integrated Public Use Microdata Series IPUMS Matt Sobek Minnesota Population Center sobek@pop.umn.edu

IPUMS-USA 1991 -- Steve RugglesIPUMS-USA 1991 -- Steve Ruggles All existing samples of US censusAll existing samples of US census Data extraction system 1998Data extraction system 1998

IPUMS-International 2001 IPUMS-International 2001 2004 IPUMS-Latin America2004 IPUMS-Latin America 2005 IPUMS-Europe2005 IPUMS-Europe 2005 NSF Expansion2005 NSF Expansion

World’s largest collection census dataWorld’s largest collection census data 30 samples per year for the next 3 years30 samples per year for the next 3 years Bob McCaaBob McCaa

Brief History

Page 4: Integrated Public Use Microdata Series IPUMS Matt Sobek Minnesota Population Center sobek@pop.umn.edu

Belarus 1999Brazil 2000 1991 1980 1970 1960Cambodia 1998 Chile 2002 1992 1982 1970 1960China 1982Colombia 1993 1985 1973 1964Costa Rica 2000 1984 1973 1960Ecuador 2001 1990 1982 1974 1962France 1990 1982 1975 1968 1962Greece 2001 1991 1981 1971Kenya 1999 1989Mexico 2000 1990 1970 1960Philippines 2000 1995 1990Romania 2002 1992South Africa 2001 1996Spain 2001 1991 1981Uganda 2002 1991United States 2000 1990 1980 1970 1960Venezuela 1990 1981 1970Vietnam 1999 1989

Datasets in IPUMS

Page 5: Integrated Public Use Microdata Series IPUMS Matt Sobek Minnesota Population Center sobek@pop.umn.edu

IPUMS Census Sample Holdings and Release DatesIPUMS Census Sample Holdings and Release Dates

June 2007 December 2007

1970 Argentina 1971 Austria 2001 Armenia 1983 Guinea 1971 Nicaragua

1980 Argentina 1981 Austria 1976 Bolivia 1996 Guinea 1973 Pakistan

1991 Argentina 1991 Austria 1992 Bolivia 1961 Honduras 1981 Pakistan

2001 Argentina 2001 Austria 2001 Bolivia 1974 Honduras 1998 Pakistan

1970 Hungary 1971 Canada 2005 Colombia 1988 Honduras 1962 Paraguay

1980 Hungary 1981 Canada 1991 Czech Republic 1971 Indonesia 1972 Paraguay

1990 Hungary 1991 Canada 2001 Czech Republic 1976 Indonesia 1982 Paraguay

2001 Hungary 1970 Malaysia 1960 Dominican Rep 1980 Indonesia 1992 Paraguay

1972 Israel 1980 Malaysia 1970 Dominican Rep 1990 Indonesia 2002 Paraguay

1983 Israel 1991 Malaysia 1981 Dominican Rep 1995 Indonesia 1993 Peru

1995 Israel 2000 Malaysia 1986 Egypt 1997 Iraq 1970 Puerto Rico

1997 Palestine 1960 Panama 1996 Egypt 1961 Israel 1980 Puerto Rico

1981 Portugal 1970 Panama 1992 El Salvador 1991 Italy 1990 Puerto Rico

1991 Portugal 1980 Panama 1966 Fiji 1993 Madagascar 2000 Puerto Rico

2001 Portugal 1990 Panama 1986 Fiji 1987 Malawi 1983 Sudan

1991 Rwanda 2000 Panama 1996 Fiji 1998 Malawi 1993 Sudan

2001 Rwanda 2005 United States 1999 France 1987 Mali 1995 Turkmenistan

2001 Venezuela 1964 Guatemala 1998 Mali 1991 United Kingdom

1973 Guatemala 2002 Mongolia 1963 Uruguay

1981 Guatemala 1960 Netherlands 1975 Uruguay

1994 Guatemala 1970 Netherlands 1985 Uruguay

2002 Guatemala 2001 Netherlands 1996 Uruguay

June 2008 to June 2009

Page 6: Integrated Public Use Microdata Series IPUMS Matt Sobek Minnesota Population Center sobek@pop.umn.edu

Dark green = disseminatingDark green = disseminating

Medium green = data held by IPUMSMedium green = data held by IPUMS

Light green = negotiatingLight green = negotiating

Yellow = not negotiatingYellow = not negotiating

IPUMS Global Coverage

Page 7: Integrated Public Use Microdata Series IPUMS Matt Sobek Minnesota Population Center sobek@pop.umn.edu

Variable Name BR CL CN CO CR EC FR KE MX ZA UG US VE VN

Relationship to hh head X X X X X X X X X X X X X X

Age X X X X X X X X X X X X X X

Sex X X X X X X X X X X X X X X

Marital status X X X X X X X X X X X X X X

Children ever born X X X x x x . X X X X x X X

Children surviving X x X x x x . X . X X . x X

Date of last birth x . . . . . . X . . X . . X

Country of birth X x . X X x . X X X X X X .

Nativity X X . X X X X X X X X X X .

Religion X X . . . . . . X X X . . x

School attendance X X . x x x x X x X X X X X

Education attainment X X X X X X X X X X X X X X

Years of schooling X X . X X X . X X X X x X X

Literacy X X X X X X . x X . X . X X

Employment status X X X X X X X X x X X X X X

Class of worker X X . X x X X x X X X X X .

Occupation X X X x x X X x X X X X . X

Industry X X X X X x X . X X x X X X

Income X . . x X . . . X X . X X .

Migration, previous country X x . X X x . X X x X x . .

Migration, internal X X . X X x X X X X X X X X

Year of migration X X . X X x . . x X X . x .

Disability x x . . x . x . . X X x X .

x = available in only some samples for that country

. = not available for that country

BR=Brazil; CL=Chile; CN=China; CO=Colombia; CR=Costa Rica; EC=Ecuador; FR=France; KE=Kenya

MX=Mexico; ZA=South Africa; UG = Uganda; US=United States; VE=Venezuela; VN=Vietnam

X = available in all samples for that country

Selected Variable Availability -- PERSON

Page 8: Integrated Public Use Microdata Series IPUMS Matt Sobek Minnesota Population Center sobek@pop.umn.edu

Variable Name BR CL CN CO CR EC FR KE MX ZA UG US VE VN

Region X X X X . . X X X . . X . X

State/province X X X X X X . X X X X X X X

District/county/municip . X X X X X . X X X X X X X

Metropolitan area X . . X . . . . . . . X . .

Urban-rural status X X . X X x X X X X X x x X

Electricity X X . x X X x X x X X . X X

Water X X . x X X . X X X X X X X

Sewage X X . x X x x X X x . X X X

Toilet . X . x X X . X X X . x X X

Home ownership X X . x X X x X X X X X X x

x = available in only some samples for that country

. = not available for that country

BR=Brazil; CL=Chile; CN=China; CO=Colombia; CR=Costa Rica; EC=Ecuador; FR=France; KE=Kenya

MX=Mexico; ZA=South Africa; UG = Uganda; US=United States; VE=Venezuela; VN=Vietnam

X = available in all samples for that country

Selected Variable Availability -- HOUSEHOLD

Page 9: Integrated Public Use Microdata Series IPUMS Matt Sobek Minnesota Population Center sobek@pop.umn.edu

What Are Microdata?

Individual-level data

• every record represents a separate person • all of their individual characteristics are recorded • users must manipulate the data themselves

Different from aggregate/summary/tabular data

• a count of persons by municipality • an employment status table by sex from a published census volume

Page 10: Integrated Public Use Microdata Series IPUMS Matt Sobek Minnesota Population Center sobek@pop.umn.edu

Kenya 1999 Census Questionnaire

Page 11: Integrated Public Use Microdata Series IPUMS Matt Sobek Minnesota Population Center sobek@pop.umn.edu

H9101000000030982025200090000001324101001000071000000008800000000P9101000000030102520252120000000002109730111020010103212001182000P9101000000030202520252120000000001109730111020020103622001181080P9101000000030302520252120201010100009000199996030101122006990000P9101000000030402520252120201010100009000199996030100912006990000P9101000000030502520252120201010100009000199996030100712006990000P9101000000030602520252120201010100009000199996030100612006990000P9101000000030702520252120201010100009000199996030100422006990000P9101000000030802520252120201010100009000199996030100322006990000P9101000000030902520252120201010100009000199996030100222006990000H9101000000040360025200030000001324101001000071000000008800000000P9101000000040102520252120000000002103110101010010103011001021000P9101000000040202520252120000000001103110101010020102121001021020P9101000000040302520252120201010100003000199990030100111006990000H9101000000050338025200030000001324101001000071000000008800000000P9101000000050102520251200000000021031001070700101045120010520000P9101000000050202520252120000000001103100107070020102522001051020P9101000000050302520252120201010100003000199990030100722006990000H9101000000060416025200040000001324101001000071000000008800000000P9101000000060102520252120000000002104200119150010104912001192000P9101000000060202520252120000000001104200119150020104922001192040P9101000000060302520252120201010100004000199991030101922006990000P9101000000060402520252120201010100004000199991030101522006990000

Raw Census Microdata from IPUMS

Page 12: Integrated Public Use Microdata Series IPUMS Matt Sobek Minnesota Population Center sobek@pop.umn.edu

H910000240000000088001001000220100P910000020101032120010010010011504P910000010201036220010010010011999P910201000301011220060010010011999P910201000301009120060010010011999P910201000301007120060010010011999P910201000301006120060010010011999P910201000301004220060010010011999P910201000301003220060010010011999P910201000301002220060010010011999H910000240000000088001001000110100P910000020101030110010290510511310P910000010201021210010290290171999P910201000301001110060010290291999H910000240000000088001001000220100P910000020101045120010010010011100P910000010201025220010010010011820P910201000301007220060010010011999H910000240000000088001001000220100P910000020101049120010010010011100P910000010201049220010010010011820P910201000301019220060010010011820P910201000301015220060010010012820

IPUMS Data Structure

Household record(shaded) followedby a person recordfor each member of the household

Relationship

AgeSexRace

BirthplaceMother’s birthplace

Occupation

For each type ofrecord, columns correspond tospecific variables

Page 13: Integrated Public Use Microdata Series IPUMS Matt Sobek Minnesota Population Center sobek@pop.umn.edu

The Advantages of Microdata

Combination of all of a person’s characteristics

Characteristics of everyone with whom a person lived

Freedom to make any table you need

Freedom to make models examining multivariaterelationships

Basically, you are only limited by the questions asked in the particular census

Page 14: Integrated Public Use Microdata Series IPUMS Matt Sobek Minnesota Population Center sobek@pop.umn.edu

1. What is the IPUMS1. What is the IPUMS

2. Harmonization2. Harmonization

3. Additional Data Enhancements3. Additional Data Enhancements

4. Access4. Access

5. Strengths and Limitations5. Strengths and Limitations

6. Research examples6. Research examples

IPUMS Overview

Page 15: Integrated Public Use Microdata Series IPUMS Matt Sobek Minnesota Population Center sobek@pop.umn.edu

Translation Table – Marital Status

MARST Marital Status

code label CN82A403 CO73A411 KN89A413 MX70A402 US90A425

100 SINGLE/NEVER MARRIED 1=never married 4=single 1=single 9=single 6=never married

200 MARRIED/IN UNION

210 Married (not specified) 2=married 2=married 3=monogamous 1=married

211 Civil 3=only civil

212 Religious 4=only religious

213 Civil and religious 2=civil and religious

214 Polygamous 3=polygamous

220 Consensual union 1=free union 5=free union

300 SEPARATED/DIVORCED 3=sep. or divorced

310 Separated 6=separated 8=separated 3=separated

321 Legally separated

322 De facto separated

330 Divorced 4=divorced 5=divorced 7=divorced 4=divorced

400 WIDOWED 3=widowed 5=widowed 4=widowed 6=widowed 5=widowed

999 UNKNOWN/MISSING 0=missing 6=unknown B=blank 1=unknown

ChinaChina19821982

ColombiaColombia19731973

KenyaKenya19891989

MexicoMexico19701970

U.S.A.U.S.A.19901990

(IPUMS-International)

Page 16: Integrated Public Use Microdata Series IPUMS Matt Sobek Minnesota Population Center sobek@pop.umn.edu

Translation Table – Marital Status

MARST Marital Status

gen code label CN82A403 CO73A411 KN89A413 MX70A402 US90A425

1 100 SINGLE/NEVER MARRIED 1=never married 4=single 1=single 9=single 6=never married

2 200 MARRIED/IN UNION

210 Married (not specified) 2=married 2=married 3=monogamous 1=married

211 Civil 3=only civil

212 Religious 4=only religious

213 Civil and religious 2=civil and religious

214 Polygamous 3=polygamous

220 Consensual union 1=free union 5=free union

3 300 SEPARATED/DIVORCED 3=sep. or divorced

310 Separated 6=separated 8=separated 3=separated

321 Legally separated

322 De facto separated

330 Divorced 4=divorced 5=divorced 7=divorced 4=divorced

4 400 WIDOWED 3=widowed 5=widowed 4=widowed 6=widowed 5=widowed

9 999 UNKNOWN/MISSING 0=missing 6=unknown B=blank 1=unknown

General Codes

Page 17: Integrated Public Use Microdata Series IPUMS Matt Sobek Minnesota Population Center sobek@pop.umn.edu

Variable Description: Literacy (International)

Page 18: Integrated Public Use Microdata Series IPUMS Matt Sobek Minnesota Population Center sobek@pop.umn.edu

1. What is the IPUMS1. What is the IPUMS

2. Harmonization2. Harmonization

3. Additional Data Enhancements3. Additional Data Enhancements

4. Access4. Access

5. Strengths and Limitations5. Strengths and Limitations

6. Research examples6. Research examples

IPUMS Overview

Page 19: Integrated Public Use Microdata Series IPUMS Matt Sobek Minnesota Population Center sobek@pop.umn.edu

Pernum Relate Age Sex Marst Chborn

1 head 46 male married n/a

2 spouse 44 female married 3

3 aunt 77 female widow 7

4 child 15 female single 0

5 child 13 female single n/a

6 child 11 male single n/a

Pernum Relate Age Sex Marst Chborn

1 head 46 male married n/a

2 spouse 44 female married 3

3 aunt 77 female widow 7

4 child 15 female single 0

5 child 13 female single n/a

6 child 11 male single n/a

Spouse’s

Mother’s Father’s

IPUMS “Pointer” Variables

Location

 

 

 

 

 

 

2

1

0

0

0

0

Location

 

 

 

 

 

 

Location

 

 

 

 

 

 

0

0

0 0

0

0

2 1

1

1

2

2

(Simple household)

Page 20: Integrated Public Use Microdata Series IPUMS Matt Sobek Minnesota Population Center sobek@pop.umn.edu

Pernum Relationship Age Sex Marst Chborn

1 head 53 female separated 6

2 child 28 male single n/a

3 child 22 male single n/a

4 child 21 male single n/a

5 child 25 female married 2

6 child-in-law 28 male married n/a

7 grandchild 3 male single n/a

8 grandchild 1 male single n/a

9 non-relative 32 female separated 2

10 non-relative 10 male single n/a

11 non-relative 5 female single n/a

Location

 

 

 

 

 

 

 

 

 

 

 

Location

 

 

 

 

 

 

 

 

 

 

 

Location

 

 

 

 

 

 

 

 

 

 

 

0

0

0

0

0

6

5

0

0

0

0

0

0

1

1

1

1

0

5

5

0

9

9

0

0

0

6

6

0

0

0

0

0

Spouse’s Father’sMother’s

IPUMS “Pointer” Variables(Complex household)

Page 21: Integrated Public Use Microdata Series IPUMS Matt Sobek Minnesota Population Center sobek@pop.umn.edu

1. What is the IPUMS1. What is the IPUMS

2. Harmonization2. Harmonization

3. Additional Data Enhancements3. Additional Data Enhancements

4. Access4. Access

5. Strengths and Limitations5. Strengths and Limitations

6. Dissemination6. Dissemination

IPUMS Overview

Page 22: Integrated Public Use Microdata Series IPUMS Matt Sobek Minnesota Population Center sobek@pop.umn.edu

IPUMS Access

• Restricted access

• Scholarly and educational purposes

• Conditions of use: key is not to redistribute

• Serious vetting

Page 23: Integrated Public Use Microdata Series IPUMS Matt Sobek Minnesota Population Center sobek@pop.umn.edu

1. What is the IPUMS1. What is the IPUMS

2. Harmonization2. Harmonization

3. Additional Data Enhancements3. Additional Data Enhancements

4. Access4. Access

5. Strengths and Limitations5. Strengths and Limitations

6. Research examples6. Research examples

IPUMS Overview

Page 24: Integrated Public Use Microdata Series IPUMS Matt Sobek Minnesota Population Center sobek@pop.umn.edu

4 Key Strengths of theCensus Microdata Samples

• National in scopeResults not subject to local peculiaritiesProvide context for local studies

More cases than any comparable datasetsEnable study of relatively small populations

• Large

• Temporal depth

Provide historical perspective

• MicrodataCan make your own tabulationsApply multivariate techniques

Page 25: Integrated Public Use Microdata Series IPUMS Matt Sobek Minnesota Population Center sobek@pop.umn.edu

Limitations of the Microdata Samples

Confidentiality

• Geography

20,000 population or larger

• Sensitive variables, swapping, etc

• Samples

Too small to answer some questions

Page 26: Integrated Public Use Microdata Series IPUMS Matt Sobek Minnesota Population Center sobek@pop.umn.edu

Other Issues and Limitations

• Not annual

Any historical analysis will have gaps

• Cross-sectional dataNot longitudinal

• Need knowledge of a statistical package

• User burden

Information overload; culturally specific knowledge

• Very large extracts

Page 27: Integrated Public Use Microdata Series IPUMS Matt Sobek Minnesota Population Center sobek@pop.umn.edu

1. What is the IPUMS1. What is the IPUMS

2. Harmonization2. Harmonization

3. Additional Data Enhancements3. Additional Data Enhancements

4. Users and Access4. Users and Access

5. Strengths and Limitations5. Strengths and Limitations

6. Research examples6. Research examples

IPUMS Overview

Page 28: Integrated Public Use Microdata Series IPUMS Matt Sobek Minnesota Population Center sobek@pop.umn.edu

IPUMS-International Research Topics

• Child labor outside the household in Mexico and Colombia

• Effect of NAFTA on educational attainment and school enrollment by region within Mexico

• Concentration of mortality within families in Kenya

• Life course patterns of co-residence among Mexicans in Mexico, Mexicans in the U.S., and Mexican Americans

• Brain drain from developing countries

• How language diversity is affected by migration and economic factors

Page 29: Integrated Public Use Microdata Series IPUMS Matt Sobek Minnesota Population Center sobek@pop.umn.edu

0

5

10

15

20

25

30

35

40

45

50

1960 1965 1970 1975 1980 1985 1990 1995 2000 2005

Pe

rce

nt

in L

ab

or

Fo

rce

MexicoMexicoCosta RicaCosta Rica

EcuadorEcuador

ChileChile

VenezuelaVenezuela

ColombiaColombia

BrazilBrazil

Married Female Labor Force Participation in Latin America(age 18 to 65)

Page 30: Integrated Public Use Microdata Series IPUMS Matt Sobek Minnesota Population Center sobek@pop.umn.edu

0

10

20

30

40

50

60

70

1920 1930 1940 1950 1960 1970 1980 1990 2000 2010

Pe

rce

nt

in L

ab

or

Fo

rce

Latin Latin AmericaAmerica

United United StatesStates

Married Female Labor Force Participation:Latin America and U.S. (age 18 to 65)

Page 31: Integrated Public Use Microdata Series IPUMS Matt Sobek Minnesota Population Center sobek@pop.umn.edu

0

10

20

30

40

50

60

70

1920 1930 1940 1950 1960 1970 1980 1990 2000 2010

Pe

rce

nt

in L

ab

or

Fo

rce

United United StatesStates

MexicoMexicoCosta RicaCosta Rica

EcuadorEcuadorChileChile

VenezuelaVenezuela

ColombiaColombia

BrazilBrazil

Married Female Labor Force Participation:Latin America and U.S. (age 18 to 65)

Compare Latin Compare Latin America to U.S. America to U.S.

40 years ago40 years ago

Page 32: Integrated Public Use Microdata Series IPUMS Matt Sobek Minnesota Population Center sobek@pop.umn.edu

Married Female Labor Force Participation:Mexican-born Women, 1970-2000

0

10

20

30

40

50

60

70

1970 1975 1980 1985 1990 1995 2000

Pe

rce

nt

in L

ab

or

Fo

rce

Mexican-born Women Mexican-born Women in United Statesin United States

Women in Women in MexicoMexico

Page 33: Integrated Public Use Microdata Series IPUMS Matt Sobek Minnesota Population Center sobek@pop.umn.edu

Working-Age Population in the Labor Force, by Sex

0

10

20

30

40

50

60

70

80

90

100B

razi

l 19

60

Bra

zil 1

97

0B

razi

l 19

80

Bra

zil 1

99

1B

razi

l 20

00

Ch

ile 1

96

0C

hile

19

70

Ch

ile 1

98

2C

hile

19

92

Ch

ile 2

00

2

Co

lom

bia

19

64

Co

lom

bia

19

73

Co

lom

bia

19

85

Co

lom

bia

19

93

Co

sta

Ric

a 1

96

3C

ost

a R

ica

19

73

Co

sta

Ric

a 1

98

4C

ost

a R

ica

20

00

Ecu

ad

or

19

62

Ecu

ad

or

19

74

Ecu

ad

or

19

82

Ecu

ad

or

19

90

Ecu

ad

or

20

01

Me

xico

19

70

Me

xico

19

90

Me

xico

20

00

Ve

ne

zue

la 1

97

1V

en

ezu

ela

19

81

Ve

ne

zue

la 1

99

0

Ch

ina

19

82

Vie

tna

m 1

98

9V

ietn

am

19

99

Ke

nya

19

89

Ke

nya

19

99

So

uth

Afr

ica

19

96

So

uth

Afr

ica

20

01

Fra

nce

19

62

Fra

nce

19

68

Fra

nce

19

75

Fra

nce

19

82

Fra

nce

19

90

Un

ited

Sta

tes

19

60

Un

ited

Sta

tes

19

70

Un

ited

Sta

tes

19

80

Un

ited

Sta

tes

19

90

Un

ited

Sta

tes

20

00

Pe

rce

nt

of

Wo

rkin

g-A

ge

Po

pu

lati

on

Males Females Persons age 16 to 65.

Page 34: Integrated Public Use Microdata Series IPUMS Matt Sobek Minnesota Population Center sobek@pop.umn.edu

Persons with Completed Secondary Education:National Populations Versus Migrants to the United States

0

10

20

30

40

50

60

70

80

90

100

Brazil Chile Costa Rica Ecuador Mexico Vietnam Kenya South Africa

Pe

rce

nt

In home country, ca. 2000 Migrants to U.S. 1995-2000

Page 35: Integrated Public Use Microdata Series IPUMS Matt Sobek Minnesota Population Center sobek@pop.umn.edu

Population Residing with an Elderly Person

0

5

10

15

20

25

30

1960

1970

1980

1991

2000

1973

1985

1993

1970

1990

2000

1989

1999

1996

2001

1982

1989

1999

1962

1968

1975

1982

1990

1960

1970

1980

1990

2000

Per

cen

t o

f to

tal

po

pu

lati

on

Elderly persons (age 65+) Non-elderly residing with an elderly person

Brazil Mexico KenyaColombia VietnamChinaS Africa France United States

Page 36: Integrated Public Use Microdata Series IPUMS Matt Sobek Minnesota Population Center sobek@pop.umn.edu

End

[email protected]