the integrated public use microdata series database ipums lab 1 background on the ipums and spss

30
The Integrated Public Use Microdata Series database IPUMS www.ipums.org www.ipums.org Lab 1 Background on the IPUMS and SPSS

Upload: dorcas-gaines

Post on 18-Jan-2016

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: The Integrated Public Use Microdata Series database IPUMS Lab 1 Background on the IPUMS and SPSS

The Integrated Public Use Microdata Series database

IPUMS

www.ipums.orgwww.ipums.org

Lab 1Background on the IPUMS and SPSS

Page 2: The Integrated Public Use Microdata Series database IPUMS Lab 1 Background on the IPUMS and SPSS

What is the IPUMS?

Who uses IPUMS?

What research is IPUMS best for?

Lab 1: Introduction to the datasetsLab 1: Introduction to the datasets

Other IPUMS-like datasets

Getting and using the data

Page 3: The Integrated Public Use Microdata Series database IPUMS Lab 1 Background on the IPUMS and SPSS

Number of Records

Year Sample (thousands) Number of

Sample Description Released Density Household Person Variables File Size1850 PUMS -- Free population 1994* 1 in 100 37 198 92 79 Mb

1860 PUMS -- Free and Slave population 2002* 1 in 100 66 354 94 141 Mb

1870 PUMS -- General sample 2002 1 in 100 80 428 94 170 Mb

1880 PUMS -- General sample 1994 1 in 100 107 503 123 204 Mb

1900 PUMS -- General sample 2002* 1 in 200 208 870 94 361 Mb

1910 PUMS -- General sample 1989* 1 in 250 113 480 125 198 Mb

1920 PUMS -- General sample 1998 1 in 100 257 1037 122 433 Mb

1940 PUMS -- General sample 1984 1 in 100 391 1351 174 584 Mb

1950 PUMS -- General sample 1984 1 in 100 461 1922 170 798 Mb

1960 PUMS -- General sample 1971* 1 in 100 579 1780 141 790 Mb

1970 PUMS -- Form 1 State sample 1972 1 in 100 744 2030 206 929 Mb

1970 PUMS -- Form 2 State sample 1972 1 in 100 744 2030 210 929 Mb

1970 PUMS -- Form 1 Metro sample 1972 1 in 100 744 2030 203 929 Mb

1970 PUMS -- Form 2 Metro sample 1972 1 in 100 744 2030 207 929 Mb

1970 PUMS -- Form 1 Neighborhood 1972 1 in 100 744 2030 260 1016 Mb

1970 PUMS -- Form 2 Neighborhood 1972 1 in 100 744 2030 264 1016 Mb

1980 PUMS -- 5% State sample 1983 1 in 20 4711 11337 276 5376 Mb

1980 PUMS -- 1% Metro sample 1983 1 in 100 942 2267 276 1075 Mb

1980 PUMS -- 1% Urban/rural sample 1983 1 in 100 942 2267 266 1075 Mb

1990 PUMS -- 5% State sample 1992 1 in 20 5528 12500 252 6039 Mb

1990 PUMS -- 1% Metro sample 1992 1 in 100 1106 2500 252 1208 Mb

1990 PUMS -- 1% Unweighted state 1995 1 in 100 1106 2500 252 1208 Mb

2000 C2SS -- 0.13% Sample 2003 1 in 750 158 372 258 207 Mb

2000 PUMS -- 1% Sample 2003 1 in 100 1237 2819 256 1675 Mb

* Preliminary sample available; larger sample currently being constructed. TOTAL 27,372 Mb

Census Samples Included in the IPUMS

Page 4: The Integrated Public Use Microdata Series database IPUMS Lab 1 Background on the IPUMS and SPSS

WHAT ARE MICRODATA?

Individual-level data

• every record represents a separate person • all of their individual characteristics are recorded • users must manipulate the data themselves

Different from aggregate/summary/tabular data

• a disability table from www.factfinder.census.gov • an occupation table from a published census volume

from the library

Page 5: The Integrated Public Use Microdata Series database IPUMS Lab 1 Background on the IPUMS and SPSS

1930 Census Population Schedule, made public April 2002

Page 6: The Integrated Public Use Microdata Series database IPUMS Lab 1 Background on the IPUMS and SPSS

H9101000000030982025200090000001324101001000071000000008800000000P9101000000030102520252120000000002109730111020010103212001182000P9101000000030202520252120000000001109730111020020103622001181080P9101000000030302520252120201010100009000199996030101122006990000P9101000000030402520252120201010100009000199996030100912006990000P9101000000030502520252120201010100009000199996030100712006990000P9101000000030602520252120201010100009000199996030100612006990000P9101000000030702520252120201010100009000199996030100422006990000P9101000000030802520252120201010100009000199996030100322006990000P9101000000030902520252120201010100009000199996030100222006990000H9101000000040360025200030000001324101001000071000000008800000000P9101000000040102520252120000000002103110101010010103011001021000P9101000000040202520252120000000001103110101010020102121001021020P9101000000040302520252120201010100003000199990030100111006990000H9101000000050338025200030000001324101001000071000000008800000000P9101000000050102520251200000000021031001070700101045120010520000P9101000000050202520252120000000001103100107070020102522001051020P9101000000050302520252120201010100003000199990030100722006990000H9101000000060416025200040000001324101001000071000000008800000000P9101000000060102520252120000000002104200119150010104912001192000P9101000000060202520252120000000001104200119150020104922001192040P9101000000060302520252120201010100004000199991030101922006990000P9101000000060402520252120201010100004000199991030101522006990000

Raw Census Microdata from IPUMS

Page 7: The Integrated Public Use Microdata Series database IPUMS Lab 1 Background on the IPUMS and SPSS

H910000240000000088001001000220100P910000020101032120010010010011504P910000010201036220010010010011999P910201000301011220060010010011999P910201000301009120060010010011999P910201000301007120060010010011999P910201000301006120060010010011999P910201000301004220060010010011999P910201000301003220060010010011999P910201000301002220060010010011999H910000240000000088001001000110100P910000020101030110010290510511310P910000010201021210010290290171999P910201000301001110060010290291999H910000240000000088001001000220100P910000020101045120010010010011100P910000010201025220010010010011820P910201000301007220060010010011999H910000240000000088001001000220100P910000020101049120010010010011100P910000010201049220010010010011820P910201000301019220060010010011820P910201000301015220060010010012820

Relationship

AgeSexRace

BirthplaceMother’s birthplace

Occupation

IPUMS Data Structure

Household record(shaded) followedby a person recordfor each member of the household

For each type of record, specificcolumns correspond to different variables

Page 8: The Integrated Public Use Microdata Series database IPUMS Lab 1 Background on the IPUMS and SPSS

The Advantages of Microdata

Combination of all of a person’s characteristics

Characteristics of everyone with whom a person lived

Freedom to make any table you need

Freedom to make models to look at multivariate relationships

Page 9: The Integrated Public Use Microdata Series database IPUMS Lab 1 Background on the IPUMS and SPSS

INTEGRATION

What the IPUMS actually does to the original census samples

Page 10: The Integrated Public Use Microdata Series database IPUMS Lab 1 Background on the IPUMS and SPSS

IPUMS Translation Table for RACE##1850 P 18 181860 P 17 181870 P 17 181880 P 10 101900 P 12 121910 P 20 211920 P 18 191940 P 16 161950 P 22 221960 P 7 71970 P 7 71980 P 12 131990 P 12 14## IPUMS 1880 1900 1910 1940 1950 1960 1970 1980 1990#White 1 00 0 0 0 1 1 0 0 01 001 Spanish write-in 1 10 12 *Black/Negro 2 00 1 1 1 2 2 1 1 02 002 Mulatto 2 10 2 2American Indian 3 00 3 2 3 3 3 2 2 03 Alaskan Athabaskan 3 01 301 Apache 3 02 302 Blackfoot 3 03 303 Cherokee 3 04 304 Cheyenne 3 05 305 Chickasaw 3 06 306 Chippewa 3 07 307 Choctaw 3 08 308 Comanche 3 09 309 Creek 3 10 310 . . . Aleut 3 30 * 005 Eskimo 3 40 * 004Chinese 4 00 4 4 5 5 5 4 4 05 006 Taiwanese 4 10 007Japanese 5 00 3 4 4 4 3 3 04 009

Original codes for “Black”

IPUMS assigned codes

Column location in original samples

Page 11: The Integrated Public Use Microdata Series database IPUMS Lab 1 Background on the IPUMS and SPSS

IPUMS Translation Table for RELATIONSHIP##1880 P 21 231900 P 09 111910 P 14 161920 P 11 141940 P 11 141950 P 16 201960 P 01 021970 P 01 021980 P 02 041990 P 09 10## IPUMS 1880 1900 1910 1940 1950 1960 1970 1980 1990#HEAD & RELATIVES:Head/Householder 01 01 100 100 100 019901999 0- 0- 000 00 01 01 00 00Spouse 02 01 120 120 120 029902999 1- 1- 010 01 Husband, not Head 02 01 140 140 2nd/3rd Wife (PG) 02 02 121 129Child 03 01 130 130 130 039903999 2- 2- 020 02 Incl Adopted, Step 03 01 20 20 (1970 error) 03 01 22 (1970 error) 03 01 26 Adopted Child 03 02 132 132 132 Stepchild 03 03 131 131 131 049904999 03 Adopted, ns 03 04 280Child-in-law 04 01 133 133 133 059905999 30 30 051 * Step Child-in-law 04 02 134 134Parent 05 01 210 210 210 079907999 32 32 040 05 Stepparent 05 02 211 211 211Parent-in-Law 06 01 213 213 213 089908999 33 33 053 * Stepparent-in-law 06 02 214 214Sibling 07 01 220 220 220 109910999 34 34 030 04 Step/Half/Adopted 07 02 221 221 221 07 02 222 07 02 223Sibling-in-Law 08 01 223 223 224 119911999 35 35 054 * Step/Half Sib-in-law 08 02 225 08 02 222 226Grandchild 09 01 270 270 270 069906999 31 31 052 06 Adopted Grandchild 09 02 272 272 272 Step Grandchild 09 03 271 271 271

Page 12: The Integrated Public Use Microdata Series database IPUMS Lab 1 Background on the IPUMS and SPSS

IPUMS Documentation: Farm Status Variable

FARM — H 78Farm status

Availability:

1850 1860 1870 1880 1900 1910 1920 1940 1950 1960 1970 1980 1990

X X X X X X X X X X X X X

Universe:All households and group quarters.

Description/Comparability:FARM identifies farm households. Only units sampled as

(non-vacant) households are eligible to be coded as farms (see GQ,p. 1.12.1). Census methods for defining and identifying farms havechanged several times. A year-by-year discussion follows: For 1850-1880, the IPUMS constructs FARM from

occupational data. Any household containing a person with theoccupation “farmer” is coded as a farm.

For 1900, the census counted a household as a farm if amember of the household operated a farm. It is not possible totell whether or not the household actually lived on or ownedthe farm they operated in 1900 (or 1850-1880).

For 1910 and 1920, enumerators identified farms using thefollowing criteria: any household located on either a tract of 3or more acres used for any agricultural operations, regardlessof the amount of labor or produce involved, or households on atract of fewer than 3 acres which either yielded $250+ inproduce sales in the previous year or employed at least onefull-time farmer or agricultural laborer.

For 1940 and 1950, enumerators simply asked the respondentwhether or not the house in which they lived was located on afarm.

For 1960 and 1970, a farm was either 1) a household on 10+acres that yielded $50+ in produce, or 2) a household on fewerthan 10 acres that yielded $250+ in produce. For 1970, vacantunits and dwellings in city lots could not be farms.

For 1980 and 1990, a farm was any household on 1+ acres thatyielded $1000+ in produce. Tenant families that paid cash rentwere considered farm households if the parcel of land theyfarmed (their “yard”) met these criteria. Those that paid nocash rent were enumerated in the same way as owner-occupiedfarms. For both years, vacant units and those on urban lotscould not be farms. 1980 also excluded households onsuburban lots, and 1990 excluded multiple-unit dwellings.

Flags: QFARM, QACREPRO (1970), QFARMPRO (1970-1990)

Codes and Frequencies:

Code 1850 1860 1870 1880 1900 1910 1920 1940 1950 1960 1970 1980 1990Non-farm 1 19420 5992 11960 66601 19825 64458 95491 318264 389486 536861 719372 923614 1088581Farm 2 17674 5398 6668 40475 7458 24356 33761 72770 71644 42351 25057 18600 17002

Page 13: The Integrated Public Use Microdata Series database IPUMS Lab 1 Background on the IPUMS and SPSS

Additional ways in which IPUMSimproves the original samples

Additional documentation, including all enumeration forms and instructions

Consistent occupation/industry classifications

Consistent metropolitan classifications

Constructed family variables

Locator variables for spouse and parents

Page 14: The Integrated Public Use Microdata Series database IPUMS Lab 1 Background on the IPUMS and SPSS

What is the IPUMS?

Who uses IPUMS?

What research is IPUMS best for?

Lab 1: Introduction to the datasetsLab 1: Introduction to the datasets

Other IPUMS-like datasets

Getting and using the data

Page 15: The Integrated Public Use Microdata Series database IPUMS Lab 1 Background on the IPUMS and SPSS

Number of IPUMS-USA extract requests, by month, 2001-2003

0

250

500

750

1000

1250

1500

1750

2000

Sep2001

Oct2001

Nov2001

Dec2001

Jan2002

Feb2002

Mar2002

Apr2002

May2002

Jun2002

Jul2002

Aug2002

Sep2002

Oct2002

Nov2002

Dec2002

Jan2003

Feb2003

Mar2003

Apr2003

May2003

Month

Ext

ract

s

Quantity of IPUMS Data DownloadedQuantity of IPUMS Data Downloaded

Page 16: The Integrated Public Use Microdata Series database IPUMS Lab 1 Background on the IPUMS and SPSS

Who uses the data?Who uses the data?

• Approximately 9,000 registered users

• About 90% are affiliated with universities

• Among those: 40% are economists 25% are sociologists

• Most other academics are from the social sciences

• Other main users include journalists and policy-makers

Profile of IPUMS users

Page 17: The Integrated Public Use Microdata Series database IPUMS Lab 1 Background on the IPUMS and SPSS

How do people get IPUMS dataHow do people get IPUMS data

15% download complete datasets1850-1970 datasets less than 1GB each1980-2000 datasets about 5GB eachWe provide raw data and command files

85% make “extracts” using online interfaceChoose the variables you wantWe provide customized data and command files

?? Go to data redistributorsQuerylogic (www.querylogic.com)PDQ (www.pdq.com)

Page 18: The Integrated Public Use Microdata Series database IPUMS Lab 1 Background on the IPUMS and SPSS

What is the IPUMS?

Who uses IPUMS?

What research is IPUMS best for?

Lab 1: Introduction to the datasetsLab 1: Introduction to the datasets

Other IPUMS-like datasets

Getting and using the data

Page 19: The Integrated Public Use Microdata Series database IPUMS Lab 1 Background on the IPUMS and SPSS

4 Key Strengths of the 4 Key Strengths of the Census Microdata SamplesCensus Microdata Samples

National in scopeResults aren’t subject to local peculiaritiesMoreover, they provide context for local studies

Have more cases than any comparable datasetsEnable study of relatively small populations

Large

Long-termProvide historical depth

MicrodataCan make your own tabulationsApply multivariate techniques

Page 20: The Integrated Public Use Microdata Series database IPUMS Lab 1 Background on the IPUMS and SPSS

Limitations of the Limitations of the Census Microdata SamplesCensus Microdata Samples

Geographic detailConfidentiality restrictions 1940-2000

DecennialAny historical analysis must use 10-year gaps

Cross-sectional dataNot longitudinal

Need knowledge of a statistical package

1-in-100 samples (1-in-20 for 1970-2000)Too small to answer some questions

Page 21: The Integrated Public Use Microdata Series database IPUMS Lab 1 Background on the IPUMS and SPSS

• Studies that do not need to identify geographic areas of less than 100,000 after 1940 (e.g., cannot identify Clemson, SC. Can identify a group of several counties of which Clemson is a part).

• Subjects that are likely to deal with at least 10,000 people, preferably more. 10,000 individuals will generate about 100 cases in IPUMS. Anything less than this is probably too small a sample for useful analysis.

• Any analysis of census-related question that is not answered via the published census volumes or summary files.

What type of question is What type of question is IPUMS best suited for?IPUMS best suited for?

Page 22: The Integrated Public Use Microdata Series database IPUMS Lab 1 Background on the IPUMS and SPSS

Published census volumes can tell you--How many southern-born persons of each race lived in each state in 1900, 1920, 1930, and 1960--occupations of all African-Americans in the North

But you’re also interested in--The jobs held by actual migrants--How their jobs compared to those who stayed home--How their jobs compared to northern-born blacks --How their settlement changed from 1870 onward

An example:An example:Southern migrants in the NorthSouthern migrants in the North

1870-19701870-1970

Page 23: The Integrated Public Use Microdata Series database IPUMS Lab 1 Background on the IPUMS and SPSS

An example:An example:Why this analysis worksWhy this analysis works

The numbers are very large--over 500,000 southerners are in the North in every decade from every decade from 1870 on

--state of residence is available in every census--a sub-state designation known as State EconomicArea (SEA) is even available for every census

I don’t need to know particular towns

Data not available anywhere else--and so it is worth the trouble

Page 24: The Integrated Public Use Microdata Series database IPUMS Lab 1 Background on the IPUMS and SPSS

An example:An example:What you can’t do with the IPUMSWhat you can’t do with the IPUMSHow did the southerners do in Pittsburgh?

--IPUMS has data on 90 employed southern black men in Pittsburgh in 1970, fewer in previous years.

--you don’t know their street, tract, or ward--all you know is their city, and only that if it was a pretty big one (>100K for 1940-50 and 1980-90;>250K for 1960-70; >100K in 2000).

Were the migrants segregated in the north?

--The census samples are cross-sectional databases,not longitudinal ones

Did migrants’ jobs improve over time?

Page 25: The Integrated Public Use Microdata Series database IPUMS Lab 1 Background on the IPUMS and SPSS

What is the IPUMS?

Who uses IPUMS?

What research is IPUMS best for?

Lab 1: Introduction to the datasetsLab 1: Introduction to the datasets

Other IPUMS-like datasets

Getting and using the data

Page 26: The Integrated Public Use Microdata Series database IPUMS Lab 1 Background on the IPUMS and SPSS

Ongoing data projects at the MPCOngoing data projects at the MPCNew high-density Public Use files

1880: 100% data for selected variables20% sample for minorities (all variables)10% sample for entire population (all variables)

1900: 10% sample

1930: 5% sample

1960: 5% sample

Page 27: The Integrated Public Use Microdata Series database IPUMS Lab 1 Background on the IPUMS and SPSS

Ongoing data projects at the MPCOngoing data projects at the MPCNew high-density Public Use files:

number of person records in each file

0

2,000,000

4,000,000

6,000,000

8,000,000

10,000,000

12,000,000

14,000,000

16,000,000

18,000,000

20,000,000

1850 1860 1870 1880 1890 1900 1910 1920 1930 1940 1950 1960 1970 1980 1990 2000

Census year

Samples planned and in progress

Existing samples

Page 28: The Integrated Public Use Microdata Series database IPUMS Lab 1 Background on the IPUMS and SPSS

New harmonized intercensal series

American Community Survey Available from 2001-2002 on main IPUMS site 2003 data will be available in the Fall of 2004

March Current Population Survey Spans from 1962-2003 Available at http://beta.ipums.org/cps Includes special questions on labor markets

Ongoing data projects at the MPCOngoing data projects at the MPC

Page 29: The Integrated Public Use Microdata Series database IPUMS Lab 1 Background on the IPUMS and SPSS

Ongoing data projects at the MPCOngoing data projects at the MPC

IPUMS InternationalCurrently contains 22 samples from 6 countriesAbout 80 variables currently available

IPUMS Latin America15 country projectGot underway this year

IPUMS Europe18 country projectGot underway this year

Page 30: The Integrated Public Use Microdata Series database IPUMS Lab 1 Background on the IPUMS and SPSS

What is the IPUMS?

Who uses IPUMS?

What research is IPUMS best for?

Lab 1: Introduction to the datasetsLab 1: Introduction to the datasets

Other IPUMS-like datasets

Getting and using the data