presentation at the research conference on research integrity niagara falls, ny may 16, 2009

23
Research Data in The Social Sciences: How Much is Being Shared? Amy Pienta Myron Gutmann Jared Lyle ICPSR, University of Michigan Presentation at the Research Conference on Research Integrity Niagara Falls, NY May 16, 2009

Upload: lana-ramsey

Post on 02-Jan-2016

19 views

Category:

Documents


2 download

DESCRIPTION

Research Data in The Social Sciences: How Much is Being Shared? Amy Pienta Myron Gutmann Jared Lyle ICPSR, University of Michigan. Presentation at the Research Conference on Research Integrity Niagara Falls, NY May 16, 2009. Types of Social Science Data. MAJOR SOCIAL SCIENCE TOPICS - PowerPoint PPT Presentation

TRANSCRIPT

Research Data in The Social Sciences: How Much is Being Shared?

Amy PientaMyron GutmannJared Lyle

ICPSR, University of Michigan

Presentation at the Research

Conference on Research Integrity

Niagara Falls, NY

May 16, 2009

Types of Social Science DataMAJOR SOCIAL SCIENCE TOPICS• Social - class, crime, social movements,

race relations, culture, folklore, family, aging• Economic - wealth, prosperity, labor,

business• Psychological - cognition, attitudes,

stereotypes• Politics - justice, democracy, public policy,

public administration, international conflict

TYPES OF DATA• Surveys, Opinion Polls, Structured

Interviews, Experiments, GIS (map)• Administrative & Historical Records • Video, Audio, Transcripts, Text• Web sites, Email, Blogs

How Can We Think About Data Sharing?• Making one’s research data available for others to

analyze and/or reanalyze

• Placing one’s data in the public domain Data archive that has a explicit mission to preserve and

disseminate data to a wide audience

Value of Data Sharing in the Social Sciences+

Replication Surveys are often more comprehensive than any one

researcher’s needs/time Improve other data collections and measurement Reduces costs by avoiding duplicate data collection

efforts Research training Data ownership larger than the PI

Many Avenues for Sharing Data in the Social Sciences• Broad-based social science data archives • National data archives (outside the US)• Thematic “boutique” archives• Institutional repositories• Journal-based archives• Individual/departmental websites

Why are data not shared?

• Preparing data and documentation can be enormously time consuming

• Need to protect the confidentiality of respondents• Fear of getting “scooped”• Lack of rewards for sharing• Limited resources for data preparation

NSF Data Sharing Policy

National Science Foundation Important Notice 106 (April 17, 1989) states: "[NSF] expects investigators to share with other researchers, at no more than incremental cost and within a reasonable time, the primary data, samples, physical collections, and other supporting materials created or gathered in the course of the research. It also encourages awardees to share software and inventions or otherwise act to make such items or products derived from them widely useful and usable."

NIH Data Sharing Policy

The NIH expects and supports the timely release and sharing of final research data from NIH-supported studies for use by other researchers. Starting with the October 1, 2003 receipt date, investigators submitting an NIH application seeking $500,000 or more in direct costs in any single year are expected to include a plan for data sharing or state why data sharing is not possible.

Goals

To identify the “universe” of social science data that have been collected

To know how much social science data is “at risk” of being lost or has been lost (versus that which is available, preserved)

To understand the value of sharing and/or data archiving

LEADS Database at ICPSR

• NICHD funding – PI Survey about Disclosure Risks

• Library of Congress funding – Identification and Appraisal of “at risk” Social Science Data

• ORI RRI funding (NLM) – Creating a research database

What is LEADS?

A database of records containing information about thousands of scientific studies that may have produced social science data

The database contains:

Descriptive information about scientific studies we identify.

Information used to determine “fit” and “value” of a scientific study

Value-added information from bibliometric analysis, PI surveys, constructed variables

Sources of InformationNational Science Foundation National Institutes of Health

LEADS Screening Criteria

• Social science and/or behavioral science• Original or primary data collection proposed,

including assembling a database from existing (archival) sources

NSF Grant Awards in LEADS

LEADS contains 17,194 awards made by NSF

LEADS spans 30 years of NSF awards - 1976 to 2005

0

100

200

300

400

500

600

700

800

900

1970 1975 1980 1985 1990 1995 2000 2005 2010

Start Year

NSF Grants Reviewed by ICPSR

NIH Grant Awards in LEADS

• NICHD, NIA, NIMH, NINR, AHRQ, NIAAA, NIDA, Clinical Center, NIDCD, FIC, NCI, NHLBI, NIDDK (1972+)

• 172,196 - total # awards screened

LEADS Database at ICPSR

# Records Reviewed # Social Science Data

Recent NSF (1976+) 17,194 2,537

Historic NSF (Pre-1976) 96,403 4,019

NIH (1972+) 172,196 6,381

285,793 12,937

Results: Total and By Funding Agency

3.8

2.4

6.6

0

1

2

3

4

5

6

7

Total(n=7,040)

NIH(n=4,719)

NSF(n=2,321)

Total (n=7,040)NIH (n=4,719)NSF (n=2,321)

Results: By Award Year

54.1

2.5

00.5

11.5

22.5

33.5

44.5

5

1985-1990(n=1,989)

1991-1996(n=2,368)

1997-2001(n=2,683)

1985-1990 (n=1,989)1991-1996 (n=2,368)1997-2001 (n=2,683)

Results: By Gender of PI

1.8

5.1

0

1

2

3

4

5

6

Women (n=2,820) Men (n=4,178)

Women (n=2,820)Men (n=4,178)

14.2%

58.7%

25.7%

0

10

20

30

40

50

60

Data AreArchived

Has Copyof Data

Data AreLost

NSF & NIH Funded Data Collections: Where are they

today? N=1,544

LEADS: How Data Are Lost

Data Intentionally Discarded“I generally keep data for…10 years beyond the last

time I do something with them.”

“The material…was considered sensitive data. Institutional review boards.. required us to promise to destroy the data after a certain period of time...”

“As I retired…I simply didn’t have the room to store these data sets at my house.”

LEADS: How Data Are Lost

Unintentionally Lost“Some data were collected, but the data file

was lost in a technical malfunction.”

“The data from the studies were on punched cards that were destroyed in a flood in the department in the early 80s.”

Conclusion & Limitations

• Most NIH and NSF funded social science data are not publicly archived Lower Bound Estimate 3.8% Upper Bound Estimate 14.2%

Limitations• Selectivity Abound (e.g. Harvard Dataverse Catalog; PI Pilot

Survey) • Have not taken into account informal data sharing