overview: access, privacy and confidentiality for integrated census microdata of 40+ countries
DESCRIPTION
IPUMS-International: A Restricted Access Web-Site Providing Anonymized, Integrated Census Microdata for Social Science and Policy Research * * * Robert McCaa, Steven Ruggles, Matt Sobek (University of Minnesota) and Albert Esteve (Centre d’Estudis Demografics). - PowerPoint PPT PresentationTRANSCRIPT
www.ipums.org/international 54th ISI, Berlin 2003
1
IPUMS-International: A Restricted Access IPUMS-International: A Restricted Access Web-Site Providing Web-Site Providing
Anonymized, Integrated Census Microdata Anonymized, Integrated Census Microdata for Social Science and Policy Researchfor Social Science and Policy Research
* * ** * *Robert McCaa, Steven Ruggles, Matt Sobek Robert McCaa, Steven Ruggles, Matt Sobek
(University of Minnesota) (University of Minnesota) and Albert Esteve (Centre d’Estudis Demografics)and Albert Esteve (Centre d’Estudis Demografics)
www.ipums.org/international 54th ISI, Berlin 2003
2
Overview: access, privacy and Overview: access, privacy and confidentiality for integrated census confidentiality for integrated census microdata of 40+ countriesmicrodata of 40+ countries
1.1. Goals and accomplishmentsGoals and accomplishments2.2. Confidentiality and privacy protections: legal, Confidentiality and privacy protections: legal,
administrative, technicaladministrative, technical3.3. Data: cleaning, constructing, and integrationData: cleaning, constructing, and integration4.4. Access: custom-tailored extracts (not whole Access: custom-tailored extracts (not whole
datasets); users and usesdatasets); users and uses5.5. Summary: new directions, 6 strengths and an Summary: new directions, 6 strengths and an
aspirationaspiration
www.ipums.org/international 54th ISI, Berlin 2003
3
1. Goals and accomplishments1. Goals and accomplishments
www.ipums.org/international 54th ISI, Berlin 2003
4
Four Goals:Four Goals: » 1. 1. InventoryInventory the world’s census microdata the world’s census microdata» 2. 2. PreservePreserve endangered microdata endangered microdata
a. contract preservation with repositoriesa. contract preservation with repositoriesb. deposit copies in at least two archives: National b. deposit copies in at least two archives: National Statistical Organization and ... WHO… Statistical Organization and ... WHO… * * ** * *
» 3. 3. IntegrateIntegrate datasets of authorized countries using datasets of authorized countries using UNSD and other standardsUNSD and other standards
» 4. 4. DisseminateDisseminate extracts of database to approved extracts of database to approved researchers without charge (copy to each NSI)researchers without charge (copy to each NSI)
www.ipums.org/international 54th ISI, Berlin 2003
5
Minnesota Population CenterMinnesota Population CenterUniversity of MinnesotaUniversity of MinnesotaPrincipal investigators:Principal investigators:
historians:historians: Steven Ruggles Steven Ruggles Robert McCaaRobert McCaa
www.ipums.org/international www.ipums.org/international
19981998 First agreement signedFirst agreement signed19991999 Funding authorizedFunding authorized20022002 First data release, 7 countries: China, First data release, 7 countries: China,
Colombia, France, Kenya, Mexico, USA, VietnamColombia, France, Kenya, Mexico, USA, Vietnam2003 Regional projects: Latin America, Europe …2003 Regional projects: Latin America, Europe …
Accomplishments:Accomplishments:
www.ipums.org/international 54th ISI, Berlin 2003
6
Preserve dataPreserve data & documentation& documentation
UN Demographic Center for Latin America UN Demographic Center for Latin America (CELADE, Santiago, Chile)(CELADE, Santiago, Chile)
3000+ microdata tapes preserved3000+ microdata tapes preserved
www.ipums.org/international 54th ISI, Berlin 2003
7
Integration projects: 40 Partners + Integration projects: 40 Partners + 1 1
(Table 1: August 16, 2003 )(Table 1: August 16, 2003 )World RegionWorld Region Official Statistical AuthorityOfficial Statistical Authority
AfricaAfrica Ghana, Kenya, MadagascarGhana, Kenya, MadagascarAmericasAmericas Argentina, Brazil, Chile, Colombia, Costa Rica, Argentina, Brazil, Chile, Colombia, Costa Rica,
Dominican Republic, Ecuador, El Salvador, Guatemala, Dominican Republic, Ecuador, El Salvador, Guatemala, Honduras, Mexico, Nicaragua, Panama, Paraguay, Honduras, Mexico, Nicaragua, Panama, Paraguay, Peru, Venezuela, USAPeru, Venezuela, USA
AsiaAsia China, Tajikistan, Turkmenistan, VietnamChina, Tajikistan, Turkmenistan, Vietnam
EuropeEurope Austria, Belarus, Bulgaria, Czech Republic, France, Austria, Belarus, Bulgaria, Czech Republic, France, Germany, Greece, Hungary, Netherlands, Portugal, Germany, Greece, Hungary, Netherlands, Portugal, Romania, Romania, RussiaRussia, Slovenia, Spain, the United Kingdom , Slovenia, Spain, the United Kingdom
Middle EastMiddle East Israel, Palestinian AuthorityIsrael, Palestinian Authority
www.ipums.org/international 54th ISI, Berlin 2003
8
Data Access: first release May 2002Data Access: first release May 2002
7 countries, 23 samples7 countries, 23 samples~60 million person records~60 million person records
USAUSA 1960, 1970, 1980, 1990, 20001960, 1970, 1980, 1990, 2000ChinaChina 1982 1982ColombiaColombia 1964, 1973, 1985, 19931964, 1973, 1985, 1993FranceFrance 1962, 1968, 1975, 1982, 19901962, 1968, 1975, 1982, 1990KenyaKenya 1989, 1999 1989, 1999MexicoMexico 1960, 1970, 1960, 1970, 1990, 2000 1990, 2000VietnamVietnam 1989, 1999 1989, 1999
www.ipums.org/international 54th ISI, Berlin 2003
9
2. Confidentiality and privacy 2. Confidentiality and privacy protectionsprotections
www.ipums.org/international 54th ISI, Berlin 2003
10
Confidentiality and privacy Confidentiality and privacy protections: changing perceptionsprotections: changing perceptions
» Growing recognition that anonymized census Growing recognition that anonymized census microdata samples do not violate national microdata samples do not violate national legislation on statistical confidentiality and privacylegislation on statistical confidentiality and privacy
» International Monetary Fund’s General Data International Monetary Fund’s General Data Dissemination System: 52 countries with uniform Dissemination System: 52 countries with uniform standards:standards:
» All enforce strict standards of statistical confidentialityAll enforce strict standards of statistical confidentiality» Prohibit disclosure of information which may identify Prohibit disclosure of information which may identify
individuals or entitiesindividuals or entities» In 2000, 37of 52 countries disseminate anonymized In 2000, 37of 52 countries disseminate anonymized
census microdata samplescensus microdata samples
www.ipums.org/international 54th ISI, Berlin 2003
11
Confidentiality protections, IPUMSI: Confidentiality protections, IPUMSI: legal, administrative, and technicallegal, administrative, and technical
1.1. Dissemination agreement between University of Dissemination agreement between University of Minnesota and each National Statistical InstituteMinnesota and each National Statistical Institute» Uniform 10 point protocol: ownership, use, authorization, Uniform 10 point protocol: ownership, use, authorization,
restrictions, confidentiality, security, publication, violations, sharing, restrictions, confidentiality, security, publication, violations, sharing, and arbitrationand arbitration
2.2. Conditional use license between the University of Conditional use license between the University of Minnesota and each researcherMinnesota and each researcher» Permission to use restricted access microdata, 3 criteria: Permission to use restricted access microdata, 3 criteria:
research need, research competence, and agree to conditions of useresearch need, research competence, and agree to conditions of use
3.3. Technical data protection measuresTechnical data protection measures» Specific to each country …/Specific to each country …/
www.ipums.org/international 54th ISI, Berlin 2003
12
Confidentiality protections, IPUMSI: Confidentiality protections, IPUMSI: technicaltechnical
» Technical data protection measuresTechnical data protection measures» Adopt sample size according to national normsAdopt sample size according to national norms» Suppress detailed geographySuppress detailed geography» Top and bottom code continuous variablesTop and bottom code continuous variables» Suppress dates: (birth, migration, marriage, etc.)Suppress dates: (birth, migration, marriage, etc.)» ““Swap” (recode) place of enumeration for a small fraction Swap” (recode) place of enumeration for a small fraction
of householdsof households» Randomly order households within administrative unitsRandomly order households within administrative units
» NoNo semi-automatic procedures (e.g., semi-automatic procedures (e.g., μ-Argus)μ-Argus)
www.ipums.org/international 54th ISI, Berlin 2003
13
Only serious researchers need apply Only serious researchers need apply (Table 2)(Table 2)
Official Over-sight Boards cited by approved usersOfficial Over-sight Boards cited by approved users Commission Nationale Information et LiberteCommission Nationale Information et Liberte Comite National d'EthiqueComite National d'Ethique Institutional Review Board--research on human Institutional Review Board--research on human subjects) subjects) IRD scientific commission (Conseil Scientifique)IRD scientific commission (Conseil Scientifique) ISA and its research committees RC28 and RC33ISA and its research committees RC28 and RC33 National Committees for Research Ethics--NorwayNational Committees for Research Ethics--Norway USA Federal Code title 13/title 26 /title 5USA Federal Code title 13/title 26 /title 5 Vice-decanat a la recherche, l'ethiqueVice-decanat a la recherche, l'ethique
Funding AgenciesFunding Agencies Canadian Foundation for InnovationCanadian Foundation for Innovation Council for the Development of Social Science Research in AfricaCouncil for the Development of Social Science Research in Africa Economic and Social Research Council, UKEconomic and Social Research Council, UK National Science FoundationNational Science Foundation National Institutes of HealthNational Institutes of Health Norwegian University Development Aid FundingNorwegian University Development Aid Funding Rockefeller FoundationRockefeller Foundation Wellcome TrustWellcome Trust
Institutional affiliations (Europe & Canada)Institutional affiliations (Europe & Canada) Cardiff UniversityCardiff University Demographic Studies Center - University A. of BarcelonaDemographic Studies Center - University A. of Barcelona Department of Statistics, University of FlorenceDepartment of Statistics, University of Florence INED ParisINED Paris Institut d etudes politiques de ParisInstitut d etudes politiques de Paris Institut francais de recherche en Afrique (IFRA)Institut francais de recherche en Afrique (IFRA) Ministry of Economic Development and TradeMinistry of Economic Development and Trade Novosibirsk State Technical UniversityNovosibirsk State Technical University University College LondonUniversity College London Department of Demography, University of MontrealDepartment of Demography, University of Montreal Queen's UniversityQueen's University Simon Fraser UniversitySimon Fraser University Statistics Canada -Library and information centreStatistics Canada -Library and information centre University of TorontoUniversity of Toronto
Institutional affiliations (Africa, Asia, Latin America)Institutional affiliations (Africa, Asia, Latin America) African Population and Health Research CenterAfrican Population and Health Research Center Centro de Investigacion y Docencia Economicas.Centro de Investigacion y Docencia Economicas. Hong Kong University of Science and TechnologyHong Kong University of Science and Technology National University of SingaporeNational University of Singapore The University of NairobiThe University of Nairobi The World BankThe World Bank Universidad Externado de ColombiaUniversidad Externado de Colombia Universidad Pedagogica Experimental LibertadorUniversidad Pedagogica Experimental Libertador World Agro-Forestry CentreWorld Agro-Forestry Centre World Health OrganizationWorld Health Organization
www.ipums.org/international 54th ISI, Berlin 2003
14
3. Data enhancements & integration3. Data enhancements & integration
www.ipums.org/international 54th ISI, Berlin 2003
15
Data Enhancements:Data Enhancements:
» Data quality and enhancements: added valueData quality and enhancements: added value» Clean data to eliminate duplicate recordsClean data to eliminate duplicate records» Conduct internal consistency checksConduct internal consistency checks» Impute missing, inconsistent valuesImpute missing, inconsistent values
» Constructed variables to facilitate analysisConstructed variables to facilitate analysis» Pointer variables for Mothers, Fathers, SpousesPointer variables for Mothers, Fathers, Spouses» Family and household variablesFamily and household variables
www.ipums.org/international 54th ISI, Berlin 2003
16
Integration (Integration (notnot standardization): standardization):
» Adopt uniform coding schemes, nomenclatures and Adopt uniform coding schemes, nomenclatures and classifications classifications » United Nations Statistics Division (Priniciples & Recs)United Nations Statistics Division (Priniciples & Recs)» UNESCO (ISCED)UNESCO (ISCED)» International Labor Office (ISCO-88)International Labor Office (ISCO-88)
» Composite coding scheme; 2 simple, but seemingly Composite coding scheme; 2 simple, but seemingly contradictory rules (contradictory rules (Table 3, next slideTable 3, next slide):):1.1. Retain original detailRetain original detail2.2. Harmonize each digitHarmonize each digit ……//
www.ipums.org/international 54th ISI, Berlin 2003
17
IPUMSI IPUMSI Col Col Fra Fra Ken Mex Mex US Viet Viet Code Label 1964 1993 1962 1975 1999 1970 2000 1960 1989 1999
0000 N/A *,5 B * B BB 0 BB 00 B B,1
ACTIVE (In Labor Force)
1000 EMPLOYED, not specified 1 1
1100 At work 4 1 1 01 1 10 10
1101 At work, and 'student' 14
1102 At work, and 'housework' 15
1103 At work, and 'seeking work' 13
1104 At work, and 'retired' 16
1105 At work, and 'no work' 18
1106 At work, public emergency 11
1107 At work, family holding, not specified
1108 At work, family holding, not agricultural 03
1109 At work, familiy holding, agricultural 04
1110 Working and studying (France)
1200 Have job, not at work last week 3 02 20 12
1300 Armed forces 13
1301 Armed forces, at work 14
1302 Armed forces, not at work last week 15
1303 Military trainee (France) 8 6
2000 UNEMPLOYED, not specified 2 3 05 2 30 20
2001 Unemployed (Vietnam) 4 5
2002 Worked less than 6 months, permanent job 2
2003 Worked less than 6 months, temporary job 6
2100 Unemployed, experience worker 1 21
2101 Seeking work, worked less than 3 months 2
2102 Seeking work, worked 3 to 6 months 3
2103 Seeking work, worked 6 to 12 months 4
2104 Seeking work, worked more than 1 year 5
2105 Seeking work, experience unspecified 6
2200 Unemployed, new worker 2 7 22
3000 INACTIVE (Not in Labor Force) 30
3100 Housework 3 6 10 3 50 31 6 2
3200 Unable to work/disabled 7 7 09 70 32 7 4
3300 In school 4 5 9 5 07 40 33 5 3
3400 Retirees and living on rent 8 60
3401 Living on rent payments
3402 Retirees/pensioners 8 4 08
3500 Elderly 6
3600 No work available/discouraged 06
3700 Inactive, other reasons 9 0 0 0 11 4 80 34 6
9000 UNKNOWN/MISSING 9 00 9 99 9
Note: In the source data columns: a comma indicates more than one code was coded to the respective IPUMS-Internationalvalue; an asterisk means programming logic was used; B indicates a blank in the source data.
Translation Table for Employment Status
Harmonized Codes and Labels Source Data Codes (selected samples) Composite coding scheme: Employment StatusComposite coding scheme: Employment Status
www.ipums.org/international 54th ISI, Berlin 2003
18
Integration Work Plan:Integration Work Plan:
» Assemble microdata and documentation (MPC, NSI)Assemble microdata and documentation (MPC, NSI)» Develop samples to minimize confidentiality risk and Develop samples to minimize confidentiality risk and
maximize robustness (MPC or NSI partner)maximize robustness (MPC or NSI partner)» Design national integration plan (NSI, consultants)Design national integration plan (NSI, consultants)
census-by-censuscensus-by-censusconcept-by-conceptconcept-by-conceptcode-by-codecode-by-code
» Write integrated documentation (MPC, partners) Write integrated documentation (MPC, partners) » Program integration (MPC)Program integration (MPC)
www.ipums.org/international 54th ISI, Berlin 2003
19
Photos from Colombia integration projectPhotos from Colombia integration project, , FebruaryFebruary-March, 2000:-March, 2000:
4 experts from DANE (census office)4 experts from DANE (census office)+7 academics (3 universities)+7 academics (3 universities)
Standard:UN/Standard:UN/Eurostat Eurostat Principles & Recs...Principles & Recs...
Census documentation Census documentation compiled for Colombian compiled for Colombian microdatamicrodata
www.ipums.org/international 54th ISI, Berlin 2003
20
4. Access4. Access
www.ipums.org/international 54th ISI, Berlin 2003
21
Data Access: Data Access: web-based extraction systemweb-based extraction system
»Password protected: to make and retrieve extractsPassword protected: to make and retrieve extracts»Researcher selects: Researcher selects:
»countries, countries, »censuses,censuses,»Cases/sub-populations, Cases/sub-populations, »variables, and variables, and »Sample densitiesSample densities
»Extract engine queues request, generates extractExtract engine queues request, generates extract»Researcher retrieves extract via webResearcher retrieves extract via web»NONO: CDs, original codes, or complete datasets: CDs, original codes, or complete datasets
www.ipums.org/international 54th ISI, Berlin 2003
22
5. Regional initiatives & summary5. Regional initiatives & summary
www.ipums.org/international 54th ISI, Berlin 2003
23
IPUMS-Latin America, 2003-2007: IPUMS-Latin America, 2003-2007: 16 countries, ~500m. people16 countries, ~500m. people
» Scope: Scope: Latin AmericanLatin American census microdata, 1960-present» Work PlanWork Plan
» 2222222 2 n l ns n r m nts w t oig ice i g ag ee e i h
l n sfficia age cie» 2002-3: Obtain funding from U.S. NIH2002-3: Obtain funding from U.S. NIH» 2004: Develop/translate microdata & metadata 2004: Develop/translate microdata & metadata » : Country expert teams design national integrations: Country expert teams design national integrations» 2005: MPC/expert teams design regional integration2005: MPC/expert teams design regional integration» 2006: MPC integrates microdata and metadata2006: MPC integrates microdata and metadata» 2007: MPC disseminates to bona fide researchers who show 2007: MPC disseminates to bona fide researchers who show
need and agree to conditions of use. need and agree to conditions of use.
www.ipums.org/international 54th ISI, Berlin 2003
24
ICM-Europe: 14+ national teamsICM-Europe: 14+ national teams
Table 5. Europe: Microdata Availability for project (bold) by Country Country 1960s 1970s 1980s 1990s 2000s Austria 1961 1971 1981 1991 2001 Belarus 1989 1999 … Bulgaria 1965 1975 1985 1992 2001 Czech Republic 1961 1970 1980 1991 2001 France 1968, 62 1975 1982 1990 1999 Germany 1961 1970 1987 micro micro Greece 1961 1971 1981 1991 2001 Hungary 1970 1980 1990 2001 Netherlands 1960 1971 1991 2003 Portugal 1960 1970 1981 1991 2001 Romania 1965 1977 1992 2002 Russia 1970 1979 1989 1994 2002 Slovenia 1981 1991 2002 Spain 1960 1970 1981 1991 2001 United Kingdom 1961 1971 1981 1991 2001 Total sets in project 3 6 10 14 14
www.ipums.org/international 54th ISI, Berlin 2003
25
Summary, 6 strengths and an Summary, 6 strengths and an aspirationaspiration
1.1. Uniform legal authorization Uniform legal authorization 2.2. Access restricted to scientists with needAccess restricted to scientists with need3.3. Experienced integration teamsExperienced integration teams4.4. Proven web-based distribution systemProven web-based distribution system5.5. High user satisfactionHigh user satisfaction6.6. Sustainability: MPC, ICPSR, WHOSustainability: MPC, ICPSR, WHOAspiration: 90 countries, 90% world’s Aspiration: 90 countries, 90% world’s
population by 2010…population by 2010…
www.ipums.org/international 54th ISI, Berlin 2003
26
additional information at:additional information at:http://www.ipums.org/internationalhttp://www.ipums.org/international
* * * * * ** * * * * *
Contact: Robert McCaaContact: Robert [email protected]@umn.edu