taking the best out of both worlds? - gesis€¦ · possible terrorists. yet “big data” can...

48
Taking the Best out of Both Worlds? The Linkage of Surveys and Administrative Data Gesis, Mannheim, September 18, 2014 Stefan Bender Manfred Antoni Joe Sakshaug Frauke Kreuter Alexandra Schmucker

Upload: others

Post on 31-May-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Taking the Best out of Both Worlds? - GESIS€¦ · possible terrorists. Yet “big data” can also be harnessed to serve the public good: scientists can use big data to do research

Taking the Best out of Both Worlds? The Linkage of Surveys and Administrative Data

Gesis, Mannheim, September 18, 2014

Stefan Bender Manfred Antoni Joe Sakshaug Frauke Kreuter Alexandra Schmucker

Page 2: Taking the Best out of Both Worlds? - GESIS€¦ · possible terrorists. Yet “big data” can also be harnessed to serve the public good: scientists can use big data to do research

Use of Adminstrative Data in Publications in Leading Journals, 1980-2010 (Raj Chetty)

2

Page 3: Taking the Best out of Both Worlds? - GESIS€¦ · possible terrorists. Yet “big data” can also be harnessed to serve the public good: scientists can use big data to do research

Motivation

3

!   Starting point: ‐ Increasing demand for comprehensive, longitudinal data in social

sciences. ‐ Rising problems with surveys, e.g. falling response rates,

increasing costs (Groves 2011). ‐ Process-produced data (Big Data, administrative data) are

increasingly examined regarding their value for research (Kreuter/Peng 2014). ‐ Each of these data sources has its specific shortcomings.

!   Remedy: ‐ Balancing the disadvantages of different data sources by

combining their advantages

!   Implementation: ‐ Create more comprehensive datasets using data linkage

Page 4: Taking the Best out of Both Worlds? - GESIS€¦ · possible terrorists. Yet “big data” can also be harnessed to serve the public good: scientists can use big data to do research

Outline

! What is administrative data?

!  Differences between administrative and survey data

!  Advantages/disadvantages

!  Linking administrative and survey records ‐ Examples from the IAB ‐ Linkage ‐ Informed Consent

!   International access to linked data (FDZ)

!  Conclusions

!  Extra: Big data and informed consent 4

Page 5: Taking the Best out of Both Worlds? - GESIS€¦ · possible terrorists. Yet “big data” can also be harnessed to serve the public good: scientists can use big data to do research

Advantages and disadvantages: Survey data

5

!   Advantages: ‐ Specifically designed for research purposes (see Groves 2011) ‐ total survey error framework (see Groves/Lyberg 2010) ‐ Subjective information on behaviours, attitudes etc.

!   Disadvantages: ‐ Missing data (unit-nonresponse, item-nonresponse, panel attrition) ‐ Misreporting (e.g. recall errors in retrospective interviews) ‐ Time restrictions ‐ High costs

Page 6: Taking the Best out of Both Worlds? - GESIS€¦ · possible terrorists. Yet “big data” can also be harnessed to serve the public good: scientists can use big data to do research

What is administrative data?

!   “Secondary” or “process” data that is collected and used primarily for administrative purposes

!  Often generated by government agencies and public/private sector organizations, who keep records of the services they deliver and processes that they register

!   Examples ‐ Social Secure System (health, pension and employment) ‐ Unemployment, active labor market programs, social benefit ‐ Pupil records ‐ Tax and income records ‐ Information collected from birth/death certificates

6

Page 7: Taking the Best out of Both Worlds? - GESIS€¦ · possible terrorists. Yet “big data” can also be harnessed to serve the public good: scientists can use big data to do research

Administrative vs. Survey Data

!  Unlike survey data, the primary use of administrative data is not for research purposes

!  Administrative data is usually collected for a population !  Administrative populations often differ from traditional survey

populations ‐ Patients covered under a particular health insurance

organization ‐ Persons diagnosed with a particular illness (e.g., cancer, HIV/

AIDS) ‐ Persons with an established credit history

!  Cannot add specific measures to administrative data !  Administrative data sources are usually longitudinal

7

Page 8: Taking the Best out of Both Worlds? - GESIS€¦ · possible terrorists. Yet “big data” can also be harnessed to serve the public good: scientists can use big data to do research

Advantages of Using Administrative Data

!  Relatively inexpensive to obtain and use

!  Saves money and resources for data collection, since data are already available ‐ No respondent burden

!  Can be more accurate than survey data because some measurement issues (e.g., forgetting, social desirability) are avoided

!  Can provide detailed longitudinal information ‐ Lifetime earnings, medical expenditures ‐ Such information may be too burdensome for respondents to

report

8

Page 9: Taking the Best out of Both Worlds? - GESIS€¦ · possible terrorists. Yet “big data” can also be harnessed to serve the public good: scientists can use big data to do research

Advantages (cont.)

!  Often contains very large sample sizes that would be too costly to achieve in surveys

!  Databases are regularly updated, sometimes continuously

!  Data are collected systematically with quality control checks

!  Nearly 100% coverage of the population of interest ‐ Includes individuals who may not respond to

surveys

9

Page 10: Taking the Best out of Both Worlds? - GESIS€¦ · possible terrorists. Yet “big data” can also be harnessed to serve the public good: scientists can use big data to do research

Disadvantages of Using Administrative Data

!   Administrative data alone is usually not sufficient to answer most research questions

!   Researcher has no control over administrative content

!   Such data may not contain all relevant variables of interest ‐ E.g., socio-demographic characteristics (e.g., education), household

composition, self-employment income, habits and behaviors, opinions and attitudinal measures, expectations, retirement plans ‐ Surveys can collect these variables, which can be used in

conjunction with administrative data

!   Concepts, definitions, reference dates, and coverage of administrative variables may not meet the research objectives

10

Page 11: Taking the Best out of Both Worlds? - GESIS€¦ · possible terrorists. Yet “big data” can also be harnessed to serve the public good: scientists can use big data to do research

Disadvantages (cont.)

!  There may be quality issues associated with variables that are not central to the administrative tasks

!  Variables may change over time without notice without any transformation ‐ E.g., occupation/industry codes

!  Metadata (description and background) may be very limited

!  Administrative data sources are often very large and their use can lead to significant processing costs

!  Strong data protection laws may complicate the data access process and/or place restrictions on the publication of results

11

Page 12: Taking the Best out of Both Worlds? - GESIS€¦ · possible terrorists. Yet “big data” can also be harnessed to serve the public good: scientists can use big data to do research

Combining Strengths of Both

!   Combining survey and administrative data may provide the best of both worlds, and mitigate their disadvantages

!   Increases the number of relevant variables for research purposes

!   Administrative variables with poor quality could be replaced with higher quality survey variables, and vice versa

!   Researcher has more control over the content of the data

!   If administrative database serves as the sampling frame, then it is possible to do extensive nonresponse bias analysis

!   At the IAB, they have a strategy for utilizing administrative data pre- and post-survey data collection

12

Page 13: Taking the Best out of Both Worlds? - GESIS€¦ · possible terrorists. Yet “big data” can also be harnessed to serve the public good: scientists can use big data to do research

!  Micro labor market data on individuals/households and establishments

Administrative Data of the RDC (FDZ) of the German Federal Employment Agency (BA)

13

Surveys

Data available at the FDZ

Administrative Data

Social Security

Notifications

Process- generated data of the

BA

Page 14: Taking the Best out of Both Worlds? - GESIS€¦ · possible terrorists. Yet “big data” can also be harnessed to serve the public good: scientists can use big data to do research

14

BIG DATA

Page 15: Taking the Best out of Both Worlds? - GESIS€¦ · possible terrorists. Yet “big data” can also be harnessed to serve the public good: scientists can use big data to do research

Exemplary project I: WeLL-ADIAB (I) (see Bender et al. 2009)

15

!  Data sources: ‐ Employee survey (project ‘Further Training as Part of

Lifelong Learning’) [S] ‐ IAB Establishment Panel [S] ‐ Employment biographies [A] ‐ Establishment histories [A]

!  Data linkage: ‐ Informed consent for linkage ‐ Linkage using social security and establishment number

!  Data access: ‐ On-site use at the FDZ or via job submission

Page 16: Taking the Best out of Both Worlds? - GESIS€¦ · possible terrorists. Yet “big data” can also be harnessed to serve the public good: scientists can use big data to do research

Exemplary project I: WeLL-ADIAB (II)

16 16

Establishment Histories

WeLL Employee Panel

IAB Establishment Panel

Employment Biographies

Establishment Histories

Employment Biographies

Administrative data

Survey data

Page 17: Taking the Best out of Both Worlds? - GESIS€¦ · possible terrorists. Yet “big data” can also be harnessed to serve the public good: scientists can use big data to do research

IAB Example: PASS

!   “Labor Market and Social Security” (PASS) survey

!  Mixed-mode study (telephone and face-to-face) conducted in Germany

!   Survey consists of two independent subsamples:

! General population sample ‐ Drawn from commercial database covering all private household

addresses

! Benefit recipient sample ‐ Drawn from Federal Employment records of persons who

received unemployment benefits at the reference date

17

Page 18: Taking the Best out of Both Worlds? - GESIS€¦ · possible terrorists. Yet “big data” can also be harnessed to serve the public good: scientists can use big data to do research

IAB Example: PASS (cont.)

!  Drawing from administrative records permit the study of nonresponse bias in each subsample ‐ The IAB routinely exploits this opportunity

! However, these records cannot be released to the public without informed consent from the survey unit

!  PASS asks respondents for consent to link survey and administrative records for research purposes ‐ Consent rate almost 80 percent

18

Page 19: Taking the Best out of Both Worlds? - GESIS€¦ · possible terrorists. Yet “big data” can also be harnessed to serve the public good: scientists can use big data to do research

Conceptual Pathway to Linkage

Responders  Sample  

 

Consenters  

Non-­‐  Consenters  

Non-­‐  Responders  

nY rY

nrY

cY

ncY

Sample  Frame/  Admin  Data    Y

19

Linked    𝑌 ↓𝐿 

Non-­‐Linked    𝑌 ↓𝑁𝐿 

Page 20: Taking the Best out of Both Worlds? - GESIS€¦ · possible terrorists. Yet “big data” can also be harnessed to serve the public good: scientists can use big data to do research

Linking Survey and Administrative Data

!  Linking survey and administrative data is becoming increasingly common in the social and health sciences

!  Basic idea of linkage ‐ Identify common variables in both data sets ‐ Link each survey record to corresponding administrative

record based on matching variables

!  Different methods for linking survey and administrative data ‐ Exact matching ‐ Probabilistic matching ‐ Statistical matching (or data fusion)

Linked records belonging to the same unit

20

Page 21: Taking the Best out of Both Worlds? - GESIS€¦ · possible terrorists. Yet “big data” can also be harnessed to serve the public good: scientists can use big data to do research

Exact Linkage

!  A link is established based on a single unique identifier ‐ Social Security number ‐ Establishment number

!  Purely deterministic approach

!  Exact 1-to-1 matching

!  Usually the survey must request the unique identifier from the respondent prior to linkage ‐ Bundled into the informed consent statement ‐ Assumed that the identifier is recorded without error

21

Page 22: Taking the Best out of Both Worlds? - GESIS€¦ · possible terrorists. Yet “big data” can also be harnessed to serve the public good: scientists can use big data to do research

German Record Linkage Center (GRLC)

22

FDZ Nuremberg University of Duisburg-Essen

Focus: Service facility Focus: Research unit

Project advisory center Development and evaluation of linkage methods

Conducting (privacy preserving) record linkage

Development of free linkage software

Secure access to linked data Dissemination of current research results

Tutorials on record linkage

financed by DFG

Page 23: Taking the Best out of Both Worlds? - GESIS€¦ · possible terrorists. Yet “big data” can also be harnessed to serve the public good: scientists can use big data to do research

Conceptual Pathway to Linkage

Responders  Sample  

 

Consenters  

Non-­‐  Consenters  

Non-­‐  Responders  

nY rY

nrY

cY

ncY

Sample  Frame/  Admin  Data    Y

23

Linked    𝑌 ↓𝐿 

Non-­‐Linked    𝑌 ↓𝑁𝐿 

Page 24: Taking the Best out of Both Worlds? - GESIS€¦ · possible terrorists. Yet “big data” can also be harnessed to serve the public good: scientists can use big data to do research

Informed Consent

§  Informed consent is believed to be an effective means of respecting individuals as autonomous decision makers with rights of self-determination.

§  For Germany informed consent is definied by law. §  Before linking administrative with survey data,

informed consent of the surveyed units is needed.

Page 25: Taking the Best out of Both Worlds? - GESIS€¦ · possible terrorists. Yet “big data” can also be harnessed to serve the public good: scientists can use big data to do research

The Selectivity of Consent

!  Correlates of consent ‐ Age, race/ethnicity, gender, education, marital status, wealth,

earnings, health status, health insurance, employment (Sala et al., 2014; Sala et al., 2012; Bates and Pascale, 2006; Jenkins et al., 2006; Banks et al., 2005; Dunn et al., 2003; Young et al., 2001; Woolf et al, 2000; Olson, 1999; Pullen et al., 1992)

‐ Item missing data, interviewer characteristics, prior-wave outcomes (Sala et al., 2012 ; Jenkins et al., 2006)

‐ Wording and placement of consent request (Sala et al., 2014; Sakshaug and Kreuter, 2014; Sakshaug et al., 2013)

!  Most studies have only looked at the selectivity of survey estimates, but selectivity of key administrative estimates is also a concern

25

Page 26: Taking the Best out of Both Worlds? - GESIS€¦ · possible terrorists. Yet “big data” can also be harnessed to serve the public good: scientists can use big data to do research

PASS Example: Estimating Linkage Consent Bias

!   “Labour Market and Social Security” (PASS) survey

!  Almost 80% of respondents consented to linkage of Federal employment records

!  Employment records contain several variables used to administer welfare and unemployment benefits ‐ Wages, benefits, and employment spells considered to be most

reliable

!  Research questions ‐ Do linkage consent biases exist for some administrative

variables? ‐ How do consent biases compared to other sources of error?

26

Page 27: Taking the Best out of Both Worlds? - GESIS€¦ · possible terrorists. Yet “big data” can also be harnessed to serve the public good: scientists can use big data to do research

Conceptual Pathway to Linkage

27

Responders  Sample    

Consenters  

Non-­‐  Consenters  

Non-­‐  Responders  

nY rY

nrY

cY

ncY

Sample  Frame/  Admin  Data    Y

Linked    𝑌 ↓𝐿 

Non-­‐Linked    𝑌 ↓𝑁𝐿 

Nonresponse Bias

Non-consent Bias

Measurement Bias

Page 28: Taking the Best out of Both Worlds? - GESIS€¦ · possible terrorists. Yet “big data” can also be harnessed to serve the public good: scientists can use big data to do research

Estimates of Linkage Consent Bias Relative to Other Sources of Bias

28

Variable Nonresponse Bias

Measurement Bias

Non-consent bias

Age 0.1 0.03 -0.3*

Foreign (%) -5.6* -2.5* -0.9*

UB II (%) 3.2* -7.1* -0.3

Disability (%) 0.4 6.0* 0.01

Employed (%) 1.0 -0.6 0.3

Income (30 days)

-71.4* 394.5* 1.7

Sakshaug and Kreuter (2012) * p < 0.05 •  Non-consent biases are present, but generally smaller than other sources of error

Page 29: Taking the Best out of Both Worlds? - GESIS€¦ · possible terrorists. Yet “big data” can also be harnessed to serve the public good: scientists can use big data to do research

Linkage and Informed Consant Rates

Name Linkage Consent

ALWA 86 92 PASS 90 86 SAVE 78 49 (57) AeKo 78 99

SFB 882 73 88 IAB-SOEP Migration 96 50 WeLL 100 92

29

Page 30: Taking the Best out of Both Worlds? - GESIS€¦ · possible terrorists. Yet “big data” can also be harnessed to serve the public good: scientists can use big data to do research

The Research Data Centre of the BA in the IAB

30

!  Tasks of the Research Data Centre (FDZ): ‐ Preparation, standardization and documentation of

research data ‐ Secure data access ‐ Advisory service on analytic potential, scope, validity

and handling of data

!  Several projects on data linkage using different sources since the FDZ’s establishment in 2004

!  Provision of (linked) data to external researchers

Page 31: Taking the Best out of Both Worlds? - GESIS€¦ · possible terrorists. Yet “big data” can also be harnessed to serve the public good: scientists can use big data to do research

Data Access

Access is Easy, Quick and Cheap !   Easy ‐ Non-technical project proposal ‐ Approval by RDC (off-site use) or Federal Ministry (on-site use) ‐ Use agreement with the institution of the researcher

!  Quick ‐ (Estimated) Time until user/institution receives contract: ‐ 2 weeks for off-site access (scientific use file) ‐ 6 weeks for on-site access

!   Cheap ‐ Data access is free of charge ‐ No lab fees ‐ No restrictions on hours/visits of on-site use facilities or runs of

remote executions 31

Page 32: Taking the Best out of Both Worlds? - GESIS€¦ · possible terrorists. Yet “big data” can also be harnessed to serve the public good: scientists can use big data to do research

32

§  UKDA, Essex will be next

Data Access I

Page 33: Taking the Best out of Both Worlds? - GESIS€¦ · possible terrorists. Yet “big data” can also be harnessed to serve the public good: scientists can use big data to do research

Summary & outlook

33

!   Data linkage allows a combination of traditional / designed research data and process-produced data from various sources.

!   Linked data may help researchers to understand the data-generating process and to determine whether model assumptions are met.

!   Total survey error framework has to be applied more thoroughly on process-produced data.

!  Granting access to linked micro-data generally possible, but ‐  Increased richness of data also increases risk of deanonymisation. ‐ Ways of access to single data sources may not be suitable for their

combination.

è The FDZ needs to improve in terms of (remote) access to linked data.

è Anyone is welcome to do research with and on our linked data sets! Quality, analytic potential and accessibility of linked administrative, survey and publicly available data

Page 34: Taking the Best out of Both Worlds? - GESIS€¦ · possible terrorists. Yet “big data” can also be harnessed to serve the public good: scientists can use big data to do research

www.cambridge.org/9781107637689 www.dataprivacybook.org

Privacy, Big Data, and the Public Good Frameworks for Engagement Edited by

Julia Lane American Institutes for Research,Washington DC Victoria Stodden Columbia University

Stefan Bender Institute for Employment Research of the German Federal Employment Agency

Helen Nissenbaum New York University

Massive amounts of data on human beings can now be analyzed. Pragmatic purposes abound, including selling goods and services, winning political campaigns, and identifying possible terrorists. Yet “big data” can also be harnessed to serve the public good: scientists can use big data to do research that improves the lives of human beings, improves government services, and reduces taxpayer costs. In order to achieve this goal, researchers must have access to this data – raising important privacy questions. What are the ethical and legal requirements? What are the rules of engagement? What are the best ways to provide access while also protecting confidentiality? Are there reasonable mechanisms to compensate citizens for privacy loss?

The goal of this book is to answer some of these questions. The book’s authors paint an intellectual landscape that includes legal, economic, and statistical frameworks. The authors also identify new practical approaches that simultaneously maximize the utility of data access while minimizing information risk.

Contributors Katherine J. Strandburg; Solon Barocas and Helen Nissenbaum; Alessandro Acquisti; Paul Ohm; Victoria Stodden; Steven E. Koonin and Michael J. Holland; Robert M. Goerge; Peter Elias; Daniel Greenwood, Arkadiusz Stopczynski, Brian Sweatt, Thomas Hardjono, and Alex Pentland; Carl Landwehr; John Wilbanks; Frauke Kreuter and Roger Peng; Alan F. Karr and Jerome P. Reiter; Cynthia Dwork

Order Today! Visit www.cambridge.org/9781107637689 or call 1.800.872.7423

20% Discount Promo Code: F4LANE

Forthcoming for

2014

Page 35: Taking the Best out of Both Worlds? - GESIS€¦ · possible terrorists. Yet “big data” can also be harnessed to serve the public good: scientists can use big data to do research

Book Goals

§  Massive amounts of data on human beings can now be analyzed. §  Pragmatic purposes abound, including selling goods, winning

political campaigns, and identifying possible terrorists. §  Big data can also be harnessed to serve the public good:

scientists can use big data to do research that improves lives of human beings and more.

§  To achieve this goal, researchers must have access to this data – raising important privacy questions.

•  What are the legal requirements? •  What are the rules of engagement? •  What are the best ways to provide access while also protecting

confidentiality? •  Are there reasonable mechanisms to compensate citizens for

privacy loss?

Page 36: Taking the Best out of Both Worlds? - GESIS€¦ · possible terrorists. Yet “big data” can also be harnessed to serve the public good: scientists can use big data to do research

Anonymity, Reachability, Information flow

§  Anonymity and consent are attractive: anonymization seems to take data outside the scope of privacy.

§  The value of anonymity inheres not in namelessness, but instead to something we called “reachability” —with or without access to identifying information.

§  Even when individuals are not ‘identifiable’, they may still be ‘reachable’, and may be subject to consequential inferences and predictions taken on that basis.

§  Big data involves practices that have radically disrupted entrenched information flows.

Book-Chapter by Barocas, Nissenbaum

Page 37: Taking the Best out of Both Worlds? - GESIS€¦ · possible terrorists. Yet “big data” can also be harnessed to serve the public good: scientists can use big data to do research

The Tyranny of the Minority

§  The willingness of a few individuals to disclose certain information implicates everyone else who happens to share the more easily observable traits that correlate with the revealed trait.

§  This is the tyranny of the minority: the volunteered information of the few can unlock the same information about the many.

Book-Chapter by Barocas, Nissenbaum

Page 38: Taking the Best out of Both Worlds? - GESIS€¦ · possible terrorists. Yet “big data” can also be harnessed to serve the public good: scientists can use big data to do research

Inference

§  A lot can be predicted about a person’s actions without knowing anything personal about them (especially in a big data context).

Book-Chapter by Barocas, Nissenbaum

Page 39: Taking the Best out of Both Worlds? - GESIS€¦ · possible terrorists. Yet “big data” can also be harnessed to serve the public good: scientists can use big data to do research

Informed Consent

§  Informed consent is believed to be an effective means of respecting individuals as autonomous decision makers with rights of self-determination.

§  Thus, where anonymity is unachievable or simply does not make sense, informed consent often is the mechanism sought out by conscientious collectors and users of personal information.

§  Understood as a crucial mechanism for ensuring privacy, informed consent is a natural corollary of the idea that privacy means control over information about oneself.

Book-Chapter by Barocas, Nissenbaum

Page 40: Taking the Best out of Both Worlds? - GESIS€¦ · possible terrorists. Yet “big data” can also be harnessed to serve the public good: scientists can use big data to do research

Transparency

§  The ideal offers data or human subjects true freedom of choice based on a sound and sufficient understanding of what the choice entails.

§  That simplicity and clarity unavoidably results in losses of fidelity.

§  Plain-language notices cannot provide information that people need to make decisions about complex contents in big data.

Book-Chapter by Barocas, Nissenbaum

Page 41: Taking the Best out of Both Worlds? - GESIS€¦ · possible terrorists. Yet “big data” can also be harnessed to serve the public good: scientists can use big data to do research

My Conclusion for Big Data

!  Blend big data and survey-based/official data.

!  Use RDC structure for access to big data or combined data.

!  No longer hands on work with data.

!  Discussion of many topics needed: informed consent, non-participation, inference, privacy …

!  Main issues: data protection, access and trust.

Ø  We have to be more active in the public discussion, because big data is affecting our daily work!!!

Page 42: Taking the Best out of Both Worlds? - GESIS€¦ · possible terrorists. Yet “big data” can also be harnessed to serve the public good: scientists can use big data to do research

www.iab.de

http:/fdz.iab.de/en.aspx

Stefan Bender [email protected]

www.cambridge.org/9781107637689 www.dataprivacybook.org

Privacy, Big Data, and the Public Good Frameworks for Engagement Edited by

Julia Lane American Institutes for Research,Washington DC Victoria Stodden Columbia University Stefan Bender Institute for Employment Research of the German Federal Employment Agency

Helen Nissenbaum New York University

Massive amounts of data on human beings can now be analyzed. Pragmatic purposes abound, including selling goods and services, winning political campaigns, and identifying possible terrorists. Yet “big data” can also be harnessed to serve the public good: scientists can use big data to do research that improves the lives of human beings, improves government services, and reduces taxpayer costs. In order to achieve this goal, researchers must have access to this data – raising important privacy questions. What are the ethical and legal requirements? What are the rules of engagement? What are the best ways to provide access while also protecting confidentiality? Are there reasonable mechanisms to compensate citizens for privacy loss?

The goal of this book is to answer some of these questions. The book’s authors paint an intellectual landscape that includes legal, economic, and statistical frameworks. The authors also identify new practical approaches that simultaneously maximize the utility of data access while minimizing information risk.

Contributors Katherine J. Strandburg; Solon Barocas and Helen Nissenbaum; Alessandro Acquisti; Paul Ohm; Victoria Stodden; Steven E. Koonin and Michael J. Holland; Robert M. Goerge; Peter Elias; Daniel Greenwood, Arkadiusz Stopczynski, Brian Sweatt, Thomas Hardjono, and Alex Pentland; Carl Landwehr; John Wilbanks; Frauke Kreuter and Roger Peng; Alan F. Karr and Jerome P. Reiter; Cynthia Dwork

Order Today! Visit www.cambridge.org/9781107637689 or call 1.800.872.7423

20% Discount Promo Code: F4LANE

Forthcoming for

2014

Page 43: Taking the Best out of Both Worlds? - GESIS€¦ · possible terrorists. Yet “big data” can also be harnessed to serve the public good: scientists can use big data to do research

German Administrative Data

Social Security Notifications ! Procedure:

‐  Employers submit notifications to the social security system ‐  For every employee and marginal worker covered by the social

security system (notification requirement) ‐  Annually. Or: Begin or end of employment, employment interruption, change of health insurance ‐  Identification: social security number and establishment number

! Purpose of data collection: ‐  Calculation of social security contributions and (unemployment)

benefits ‐  Statistics

43

Page 44: Taking the Best out of Both Worlds? - GESIS€¦ · possible terrorists. Yet “big data” can also be harnessed to serve the public good: scientists can use big data to do research

Procedure of Social Security Notifications (simplified)

44

Establishments/ Employers

Receiving offices of the notification

procedure (health insurance

companies)

German Federal Pension Fund

Federal Employment Agency

Page 45: Taking the Best out of Both Worlds? - GESIS€¦ · possible terrorists. Yet “big data” can also be harnessed to serve the public good: scientists can use big data to do research

45

Notification to the Social Security System:

§  Social Security Number §  Establishment Number §  Last Name §  First Name §  Address §  Reason for Notification §  Times of Employment (on a daily basis) §  Nationality §  School Education §  Vocational Training §  Type of Employment §  Wages §  Occupational Status

Page 46: Taking the Best out of Both Worlds? - GESIS€¦ · possible terrorists. Yet “big data” can also be harnessed to serve the public good: scientists can use big data to do research

2. Administrative Data

! No information about civil servants, freelancers or self-employed

‐  Internal processes of the Federal Employment Agency ‐  Payment/Receipt of unemployment benefits ‐  Participation in labour market programs ‐  Registered job search

! Exact start and end dates ! Computer-aided processes

! Since 2011 new information: new occupation classification, working hours

46

Page 47: Taking the Best out of Both Worlds? - GESIS€¦ · possible terrorists. Yet “big data” can also be harnessed to serve the public good: scientists can use big data to do research

2. Administrative Data

! Federal Employment Agency transmits data to IAB ! IAB merges social security notifications and BA data

ð (complete) individual employment biographies

Employment history covered by social security system (since 1975) Unemployment benefit receipt (since 1975)

Registered job search (since 2000) Participation in labor market programs (since 2000)

47

apprenticeship pension

Page 48: Taking the Best out of Both Worlds? - GESIS€¦ · possible terrorists. Yet “big data” can also be harnessed to serve the public good: scientists can use big data to do research

Inconsistencies

! Purpose of data collection: ‐  Calculation of social security contributions and (unemployment)

benefits ‐  Statistics

! Variables, which are highly accurate: ‐  Sex, Birthdate (included in the Social Security Number) ‐  Wage and beginning and ending of a job

! All of the other variables playing a minor role and therefor they are not highly accurate.

48