completing the circlewokinfo.com/media/pdf/dci-webinar.pdficsu wds – roles & relations in a...
TRANSCRIPT
COMPLETING THE CIRCLE:
PERSPECTIVES ON INTEGRATING DATASETS IN BASIC RESEARCH AND DISCOVERY
Panelists Mary Vardigan John Kunze Michael Diepenbroek Nigel Robinson
January 31, 2013
PANELIST CONTACT INFORMATION Nigel Robinson (moderator, presenter)
Director, York Operations Thomson Reuters United Kingdom [email protected] go.thomsonreuters.com/DCI
Mary Vardigan Assistant Director Inter-university Consortium for Political and Social Research (ICPSR) United States [email protected] http://www.icpsr.umich.edu/icpsrweb/landing.jsp
John Kunze Associate Director University of California Curation Center California Digital Library United States [email protected] http://www.cdlib.org/
Michael Diepenbroek Managing Director PANGAEA Data Publisher for Earth & Environmental Sciences (ICSU World Data System - WDS) Germany [email protected] www.pangaea.de
AGENDA INTRODUCTION
GUEST SPEAKERS
Q&A
THE DIGITAL UNIVERSE EXPANSION
DIGITAL SCHOLARSHIP
Very visible within the literature as a concept Articles, projects, university labs all devoted to digital scholarship in various ways
Digital Scholarship
Authors / researchers Research administrators Librarians, data archivists Publishers Grant funding organizations
Interested Parties
Discipline-specific and multidisciplinary content Needs and requirements vary by discipline Diverse content formats, with few standards Includes collaboration and communications
Content
“Data is the new gold” – Neellie Kroes, EU Digital Agenda Commissioner
THE INCREASING VISIBILITY OF DATA Grant funding agencies
Journal publishers Publisher website Data journals
Data repositories & registration agencies
SHARING RESEARCH DATA HOW CAN WE ENCOURAGE GOOD PRACTICE?
Mary Vardigan Assistant Director, ICPSR January 31, 2013
OUTLINE OF PRESENTATION What is ICPSR?
Importance of data sharing
Ways ICPSR is encouraging good practice
Benefits of the data citation index
WHAT IS ICPSR? Repository of social science data established in 1962
Over 8,000 studies, over 60,000 datasets
Membership-based organization – over 700 members
Source for training in statistics and data curation through the summer program
www.icpsr.umich.edu
IMPORTANCE OF DATA SHARING Open scientific inquiry – Findings can be verified
New research – Extend original findings, address new questions
Reduced costs – Large collections like the General Social Survey intended for sharing (over 9,000 publications written)
Training – Students benefit from using others’ data
MORE ON DATA SHARING Colleagues surveyed principal investigators on data sharing behavior
Findings: When data are shared, two to three times as many primary publications result1
Data sharing leads to more science, more knowledge
1 A. Pienta, G. Alter, J. Lyle (2010). The Enduring Value of Social Science Research: The Use and Reuse of Primary Research Data. http://hdl.handle.net/2027.42/78307
ENCOURAGING GOOD PRACTICE
PROMOTE STANDARDS FOR DATA CITATION Have provided data citations since 1990, DOIs since 2008
With Data-PASS partners, contacted major journals in sociology, economics, political science – Highlighted past data citation practices – Emphasized use of citations, persistent
identifiers – ASR revised its submission guidelines to reflect
data citation requirement
BUILD COMMUNITY ENGAGEMENT Sloan Foundation Grant: Three meetings
Establish consistent citation of data in social science journals
Promote research transparency and replication
Optimize editorial workflows related to data
Develop common standards/solutions for repositories
Explore models for sustainability of repositories
Challenge grants – three to five grants of up to $20,000 each (announcement coming soon)
OFFER RICH METADATA Metadata are key to discovery and to effective data use
ICPSR’s 8000 studies have structured XML metadata compliant with the DDI standard
We also provide searchable metadata at the question/variable level
LINK DATA AND PUBLICATIONS ICPSR’s Bibliography of Data- Related Literature – 60,000 citations
Two-way linking: Data link to publications, publications to data
Forms the basis for information in DCI
Proper data citation practice with DOIs can automate links between data and publications
PARTICIPATE IN THE DATA CITATION INDEX Reinforces good practice – linking publications and data, data citation, metadata, access
Brings greater visibility for data resources and data producers
Elevates status of research data
Highlights DOIs for data prominently
Broadens resource discovery across disciplines
Shows impact of investment in data to funders
SUGGESTIONS FOR THE DATA CITATION INDEX Add more links between data and publications for more repositories
Integrate data fully into the Web of Knowledge, using appropriate language (e.g., “this dataset has been cited“)
THANK YOU…
Mary Vardigan ICPSR [email protected]
LIBRARY TOOLS SUPPORTING DATA-RICH RESEARCH UNIVERSITY OF CALIFORNIA CURATION CENTER CALIFORNIA DIGITAL LIBRARY
THE RESEARCH DATA PROBLEM
Journal article – Uniquely and persistently
identified
– Concept of “publish”
– Multiple copies
– Easily findable
– Services: impact metrics, citation tracking, etc.
Research data – Nope
– Not really
– Typically one
– Difficult
– Nope
Research data is seen as a second-class citizen in the scholarly record.
WHERE CAN LIBRARIES MAKE A DIFFERENCE?
Research
Collect Save
Publish Share
Create Knowledge
Research & Scholarship Lifecycle
COLLECT> PUBLISH> SHARE> SAVE> RESEARCH
Capture today’s web; build tomorrow’s archives
Create, edit, share, and save data management plans
Open source curation add-in for Microsoft Excel
COLLECT> PUBLISH> SHARE> SAVE> RESEARCH
Create and manage persistent identifiers: ARKs,
DOIs, etc.
An infrastructure to publish and get credit for sharing research
data
COLLECT> PUBLISH> SHARE> SAVE> RESEARCH
Curation repository: store, manage, preserve, and share
research data
Open deposit, open access repository for spreadsheet data
Data Observa�on Network for Earth
COLLECT> PUBLISH> SHARE> SAVE> RESEARCH
What’s missing to complete the “incentive” circuit?
Impact measures, citation tracking
“Connecting the data to the research it informs”
Altmetrics tools to measure non-traditional products and
uses , etc. ,
THE REST OF THE STORY
www.cdlib.org/uc3
dataup.cdlib.org
www.escholarship.org
wokinfo.com/products_tools/multidisciplinary/dci/
RESEARCH DATA ENTERS SCHOLARLY COMMUNICATION TOWARDS AN INFRASTRUCTURE FOR DATA PUBLICATION IN THE EMPIRICAL SCIENCES
Michael Diepenbroek, Hannes Grobe, Uwe Schindler PANGAEA® - AWI / MARUM
Licenses (Creative Commons) Business models
Open Access
Persistent identification
PREREQUISITES FOR DATA PUBLICATION?
Effort needed
Data
Value
Articles
Trusted & certified archives
Source: PARSE Insight, Report 3.4 www.parse-insight.eu
Researchers: Publishers:
PREREQUISITES FOR DATA PUBLICATION?
QA/QC -> review procedures (Meta)data & interoperability standards
(machine readable)
DOC
CSV
NetCDF
TXT
XML
XLSX
XLS
GRIB
…
OECD principles and guidelines for access to research data (2007) Professionalism Interoperability Quality Efficiency
Data Set Data Set Data Set
Data Set
Data Set Data Set
Data Set
Data Set
Data Set
…
PREREQUISITES FOR DATA PUBLICATION?
Data
time
Article Data
Article
Article Data
Data
Article Data
Citability
PREREQUISITES FOR DATA PUBLICATION?
COLLABORATION BETWEEN DATA ARCHIVES & SCIENCE JOURNALS Linking editorial workflows Linking services
ICSU WDS – ROLES & RELATIONS IN A FEDERATED SYSTEM
Publishers commercial, open access
(e.g. ESSD journal), crossreferencing
Data Collection & Processing Facilities
QA/QC, data products, also data rescue
Data Archiving & Publication Facilities
certified repositories
Related Networks & Programs
GEOSS, GMES, WMO-IS, IOC etc
Metadata & Data Services
web portals, catalogues
Visualisation & Analysis
compute systems, virtual labs, GIS systems
Research Institutions universities,
research institutes
Research Projects / Programs national, EU, international
Libraries DOI registry
interdiscipl. catalogues
Research Facilities sattelites, vessels,
observatories, alert systems etc.
Education & Outreach
Scientific Communities & Other Stakeholders
Datasets and Data Citation Index, 2013 ~ www.icsu-wds.org
BIBLIOMETRICS 35% to 69% more
citations
Courtesy of Jon Sears (AGU)
Piwowar HA, Day RS, Fridsma DB (2007) Sharing Detailed Research Data Is Associated with Increased Citation Rate. PLoS ONE 2(3): e308. doi:10.1371/journal.pone.0000308
www.icsu-wds.org
Hydrosphere Lithosphere Atmosphere Cryosphere
Total number of data sets ~350.000 Data items ~ 6.3 billions
PANGEA MULTIDISCIPLINARY DATA ARCHIVE AND PUBLISHER
LINKING INFRASTRUCTURE
Publishers Elsevier Nature
Springer Wiley
Bibliometrics Thomson
Catalogues Scopus WDS
GEOSS PANGAEA linking WS OAI-PMH
DATA PUBLISHING – CROSS-REFERENCING
DATA PUBLISHING – CROSS-REFERENCING
DATA PUBLISHING – CROSS-REFERENCING
Publishers
Data archive Bibliometrics
Catalogues
Data archive
LINKING INFRASTRUCTURE
Data archive
Data archive
Data archive
…
ICSU WDS PERSPECTIVE
Certified Data Archives
Registries
Bibliometric Services
Catalogues
Web of Knowledge Google Scholar Scopus
Thomson Reuters Citation Indexes
Crossref DataCite ORCID
Journals
ICSU WDS
COLLABORATING TO CREATE THE DATA CITATION INDEX NIGEL ROBINSON
THE INCREASING VISIBILITY OF DATA Grant funding agencies
Journal publishers Publisher website Data journals
Data repositories & registration agencies
DEPOSITION OF DATA BY RESEARCHERS
48
24%
36%
47%
51%
17%
Publisher website
Repository managed by a third party (e.g, domain-‐…
Department or institutional repository
Personal website
Other
Q16. Where do you place your non-‐traditional scholarly output to make it available to others? (n=471)
RESEARCHERS NOT RECEIVING CREDIT
49
Barriers to creating and sharing data: Work is not adequately
exposed or accredited
data repositories do not have clear standards or mechanisms in place for doing so
BARRIERS TO RESEARCHERS CITING DATA Researchers agree that data should be cited, but there are currently no universally accepted standards for citing data
50
“Lack of knowledge about standards for citation and of proper scholarly recognition and/or evaluation of such materials…” “…cumbersome citation formats including very long internet addresses.” “Incomplete citation information available (dates and real author names as distinct from aliases).”
DATA CITATION BEHAVIOUR Current citation style (in full text of article)
Desired/future citation style (as part of cited references)
U.S. Dept. of Justice, Bureau of Justice Statistics (1996): MURDER CASES IN 33 LARGE URBAN COUNTIES IN THE UNITED STATES, 1988. Version 1. Inter-university Consortium for Political and Social Research
http://dx.doi.org/10.3886/ICPSR09907.v1
Lee, Seung-Jae; Lee, He-Jin; Cho, Ji-Hoon; Rho, Sangchul; Hwang, Daehee (2008): GSE11574: The responses of astrocytes
extracellular a-synuclein. Gene
acc=GSE11574
OBSERVED RESEARCHER PROBLEMS Access & discovery
Citation standards
Lack of willingness to deposit and cite
Lack of recognition / credit
WHERE DO WE START? Enable the discovery of data repositories, data studies and data sets in the context of traditional literature
Help researchers find data sets and studies and track the full impact of their research output
Provide expanded measurement of researcher and institutional research output and assessment
Facilitate more accurate and comprehensive bibliometric analyses
DATA REPOSITORIES Over 500 repositories identified
INDEXING A DATA REPOSITORY ON WEB OF KNOWLEDGE
Repository/Source: Comprises data studies, data sets and/or microcitations. Stores and provides access to the raw data.
Data Study: Descriptions of studies or experiments with associated data which have been used in the data study. Includes serial or longitudinal studies over time.
Data Set: A single or coherent set of data or a data file provided by the repository, as part of a collection, data study or experiment.
Microcitation: (nanopublication) An assertion about concepts that have been found to be linked by scientific enquiry, and can be uniquely identified and attributed to its author. Made up of three separate parts: a subject, a predicate and an object. 55
Record Types
DCI record: data repository
data study data set
microcitation
Descriptive metadata feed from repository
Repository raw
metadata is analysed
Metadata added
DATA REPOSITORY MODEL
Repository
Data Study
Data Set
Microcitation
Data Study, Data Set and Microcitation levels are optional
CHALLENGES Metadata availability
– Lack of resources – Lack of expertise
Metadata quality – Metadata inconsistencies
Data repositories are not static
Partnerships
DATA CITATION INDEX - METADATA PARTNERSHIPS
DataCite
Repository 1
Repository 2
Repository 3
Data Citation Index
DataCite
Data Citation Index
Repository 1
Repository 2
Repository 3
COLLABORATION BENEFITS Any repository providing metadata to the aggregator is included in the Data Citation Index
Uniform data
Faster and more frequent updates
QUESTION & ANSWER Please type any questions into the webex chat panel