ipums-international steven ruggles minnesota population center

30
IPUMS-International Steven Ruggles Minnesota Population Center

Upload: franklin-baldwin

Post on 30-Dec-2015

224 views

Category:

Documents


0 download

TRANSCRIPT

IPUMS-International

Steven Ruggles

Minnesota Population Center

What is IPUMS-International?

• The IPUMS-International project is creating an integrated global database of over 150 censuses from at least 44 countries.

• It will be the world’s largest public-use population database, with multiple samples from each country enabling analyses across time and space.

• The microdata and accompanying documentation will be freely available for scholarly and educational research through a web-based data dissemination system.

The Problem

• A vast body of raw census microdata covering much of the world over the past four decades survives in machine-readable form.

• In most countries, these census microdata are either unavailable to researchers or difficult to obtain.

• These data are at constant risk of destruction because of technological obsolescence, physical aging of computer tapes, and loss of institutional memory and documentation

Why it matters

• In the few countries where census microdata are readily available to researchers, they have become an indispensable part of social science infrastructure.

– In the journal Demography, the leading U.S. journal of population, census microdata are used three times as often as any other source for studies of the U.S. or Canada.

• No alternate source offers comparable sample sizes, chronological depth, or widespread availability across countries.

Advantages of Census Microdata Samples

Many more cases than any alternative datasetsEnable study of relatively small populationsAllows analysis of effects of local conditions on behavior

• Large

• Long-term

Data usually available for multiple decades

• Flexible Tabulations can be customized to research problemMultivariate analysis feasibleHarmonization is possible, allowing analyses that cross borders and time periods

Cross-National Harmonization and Open Access:National Academy of Science recommendations

• “National and international funding agencies should establish mechanisms that facilitate the harmonization of data collected in different countries.”

• “Cross national studies conducted within a framework of comparable measurement can be a substantially more useful tool for policy analysis than studies of single countries.”

• “The scientific community, broadly construed, should have widespread and unconstrained access to the data.”

Source: Preparing for an Aging World: The Case for Cross-National Research (National Academy, 2001)

The Model: IPUMS-USA

• Project to harmonize U.S. Census microdata for the period 1850-2000

• 1992-1995: NSF-funded IPUMS project harmonized samples using composite codes, documented comparability; 250,000 transformations, 3,000 pages of printed documentation

• 1995-1999: Another NSF project funded an online data access system with integrated hypertext documentation

Success of IPUMS-USA

User friendly access, harmonized codes, and integrated comprehensive hypertext documentation led to flood of historical census-based research:

• 12,000 users, 75,000 custom data extracts

• Currently distributing an average of 638 MB/hr, 24/7

• 1,300 publications and working papers

– IPUMS-based research is concentrated in the top U.S. journals: the most common venues are Demography, American Economic Review, Journal of Political Economy, American Sociological Review, Social Forces, and Quarterly Review of Economics

IPUMS-International

• After 1960, most censuses around the world were tabulated by computer

• McCaa decided that IPUMS model should be applied to other countries

• Began with a project for Columbia, then in 1999 NSF Infrastructure grant to add six more countries

• 2005-2009: new HSD grant to increase database to 44 countries

• NICHD is also assisting with funding

IPUMS-International samples: First releaseCountry Census Year % Sample N of Persons N of Households

Brazil 1960 5 3,001,400 313,3001970 5 4,953,800 1,022,2001980 5 5,870,500 1,343,8001991 5.8 8,522,700 2,012,3002000 6 10,136,000 2,652,400

China 1982 0.1 1,002,700 242,700Colombia 1964 2 349,700 n.a.

1973 10 1,988,800 349,9001985 10 2,643,100 571,0001993 10 3,213,700 788,000

France 1962 5 2,320,900 748,9001968 5 2,487,800 815,7001975 5 2,629,400 915,6001982 5 2,631,700 969,6321990 4.2 2,360,900 949,893

Kenya 1989 5 1,074,100 224,9001999 5 1,410,200 318,200

Mexico 1960 1.5 502,800 n.a.1970 1 483,400 98,3001990 10 8,118,000 1,648,0002000 10.6 10,099,200 2,312,000

United States 1960 1 1,799,900 579,2001970 1 2,029,700 744,5001980 5 11,336,500 4,711,0001990 5 12,500,500 5,528,0002000 5 14,095,000 6,185,000

Vietnam 1989 5 2,627,000 534,2001999 3 2,368,200 534,100

TOTAL 122,557,600 37,112,725

IPUMS-International Users

• Prospective users must sign confidentiality agreement and provide an abstract explaining need for the data

• Through 9/1/05 we had 980 applicants to use the database, of which 582 were approved (59 percent)

• Users represent 40 countries and 250 institutions, including many international organizations (e.g., ILO, WHO, World Bank, Inter-American Development Fund)

Early results

National Academy of Sciences panel (2005) used data from Colombia, Kenya, Mexico, and Vietnam to analyze changing outcomes such as schooling, work, fertility, and marriage as a function of age, gender, and household characteristics.

Early results

Cynthia Feliciano (2005) compared the education of immigrants to the United States with those who remained behind to understand patterns of selectivity

Other topics include:

• Changing living arrangements of the aged• Concentration of mortality within families• Impact of rainfall on health and economic welfare• Female labor-force participation and educational attainment • Regional inequality differentials• Brain drain from developing countries• Effects of emigration on labor markets• Relationship between divorce and family composition • Relationship between disease factors and education• Relationship between educational attainment and cohort

size. • Effect of NAFTA on educational attainment and school

enrollment by region within Mexico

Number of countries requested by IPUMS-International users

(percent distribution)

1 country 39

2 countries 24

3 countries 10

4 countries 6

5-8 countries 20

Most users request multiple countries

IPUMS-International Tasks

• Inventory and preservation of data and documentation

• Processing • Documentation (especially comparability)• Dissemination—obtain licenses that allow us to

disseminate data for educational and scholarly use, and set up secure web-based dissemination system

IPUMS-International Tasks

• Inventory and preservation of data and documentation

• Processing • Documentation (especially comparability)• Dissemination—obtain licenses that allow us to

disseminate data for educational and scholarly use, and set up secure web-based dissemination system

UN Demographic Center for Latin America (CELADE, Santiago, Chile)~3000 microdata tapes recovered and metadata (documentation)

IPUMS-International Preservation Initiatives

Status of Data Acquisition

dark green = disseminating

medium green = data received

light green = negotiating

Current IPUMSI Latin America Europe Asia, Africa, Other

Brazil Argentina Austria ArmeniaChina Bolivia Bulgaria CambodiaColombia Chile Belarus CanadaFrance Costa Rica Czech Republic EgyptKenya Dominican Republic Germany FijiMexico Ecuador Greece IndonesiaUnited States El Salvador Hungary IraqVietnam Guatemala Ireland Israel

Honduras Netherlands MalaysiaNicaragua Romania MongoliaPanama Slovenia PakistanParaguay Spain Palestinian AuthorityPeru United Kingdom PhilippinesUruguay South AfricaVenezuela Tajikistan

Turkmenistan

Data Received or Agreement Signed

Current IPUMS-International Partners

Current funding for 44 countries by 2009Next data release late spring 2006

Current IPUMSI Latin America Europe Asia, Africa, Other

Brazil Argentina Austria ArmeniaChina Bolivia Bulgaria CambodiaColombia Chile Belarus CanadaFrance Costa Rica Czech Republic EgyptKenya Dominican Republic Germany FijiMexico Ecuador Greece IndonesiaUnited States El Salvador Hungary IraqVietnam Guatemala Ireland Israel

Honduras Netherlands MalaysiaNicaragua Romania MongoliaPanama Slovenia PakistanParaguay Spain Palestinian AuthorityPeru United Kingdom PhilippinesUruguay South AfricaVenezuela Tajikistan

Turkmenistan

Data Received or Agreement Signed

Current IPUMS-International Partners

Current funding for 44 countries by 2009Next data release late spring 2006

IPUMS-International Tasks

• Inventory and preservation of data and documentation

• Processing • Documentation (especially comparability)• Dissemination—obtain licenses that allow us to

disseminate data for educational and scholarly use, and set up secure web-based dissemination system

Processing

1. Standardize format

2. Correct format errors

3. Draw samples

4. Add confidentiality protections

5. Harmonize codes

6. Edit and allocate missing or inconsistent data

7. Add standard constructed variables

Pernum Relationship Age Sex Marst Chborn

1 head 53 female separated 6

2 child 28 male single n/a

3 child 22 male single n/a

4 child 21 male single n/a

5 child 25 female married 2

6 child-in-law 28 male married n/a

7 grandchild 3 male single n/a

8 grandchild 1 male single n/a

9 non-relative 32 female separated 2

10 non-relative 10 male single n/a

11 non-relative 5 female single n/a

Location

 

 

 

 

 

 

 

 

 

 

 

Location

 

 

 

 

 

 

 

 

 

 

 

Location

 

 

 

 

 

 

 

 

 

 

 

0

0

0

0

0

6

5

0

0

0

0

0

0

1

1

1

1

0

5

5

0

9

9

0

0

0

6

6

0

0

0

0

0

Spouse’s Father’sMother’s

Constructed Variables: IPUMS Family Interrelationship Pointers

IPUMS-International Tasks

• Inventory and preservation of data and documentation

• Processing • Documentation (especially comparability)• Dissemination—obtain licenses that allow us to

disseminate data for educational and scholarly use, and set up secure web-based dissemination system

Documentation

1. Translate codebooks, enumeration forms, and enumeration instructions into English

2. Standardize format and add xml tags 3. Write documentation identifying comparability

problems across countries, and within countries, across time periods

4. Assemble and scan ancillary documentation (e.g. census maps, post-enumeration survey results, and additional information on post-enumeration processing).

Variable Description: Literacy (International)

IPUMS-International Tasks

• Inventory and preservation of data and documentation

• Processing • Documentation (especially comparability)• Dissemination—obtain licenses that allow us to

disseminate data for educational and scholarly use, and set up secure web-based dissemination system

Dissemination

• Uniform perpetual agreements with national statistical agencies allows us to disseminate anonymized microdata to researchers who agree to a web-based confidentiality agreement

• MPC staff assess research proposals for feasibility • Disputes with agencies, if they arise, will be settled by

the International Court of Arbitration in Paris• Data dissemination occurs exclusively through the

IPUMS-International web-based data access system