Data Exchange, Data Citation:
An overview of some community work
Todd CarpenterExecutive Director, NISO
May 20, 2012 Council of Science Editors
May 20, 2012 CSE: Data Standards and Data Citation Issues
2
National Information Standards Organization
Non-profit industry trade association accredited by ANSI
110+ Organizational Members
Divided roughly in thirds among publishers, libraries and support service providers
Mission of developing and maintaining technical standards related to information, documentation, discovery and distribution of published materials and media
Volunteer driven organization: 400+ contributors who are spread out across the world
CSE: Data Standards and Data Citation Issues
3May 20, 2012
Standards are familiar, even if you don’t notice
Image: DanTaylor Image: Joel Washing
Data Publishing
May 20, 2012 CSE: Data Standards and Data Citation Issues
4
CSE: Data Standards and Data Citation Issues
5
Large Scale Data-Driven Science
May 20, 2012
Increasingly, scientific breakthroughs will be powered by advanced computing capabilities that help researchers manipulate and explore massive datasets.
CSE: Data Standards and Data Citation Issues
6
Science data isn’t what it used to be
May 20, 2012
Image: Walters Art Museum Image: Domenico, Caron, Davis, et al.
CSE: Data Standards and Data Citation Issues
7
Challenges of Data Publishing/Sharing
Data– is massive in scale.– can be constantly changing.– is difficult to describe/catalog.– is often difficult to integrate with other
data.– can be complex to analyze (tools
required).– is hard to peer review.– is challenging to curate.– is challenging to preserve.May 20, 2012
May 20, 2012 CSE: Data Standards and Data Citation Issues
8
Data – Complexities
StorageProvenanceLegal IssuesTechnical InteroperabilityMetadataPrivacyIdentificationCitation
CSE: Data Standards and Data Citation Issues
9
Problem with data, e.g.: mutabilityThe fact data sets are constantly changing
poses the following problems (not exhaustive, BTW)– How does someone rerun your experiment,
when the data set isn’t now what it was when you ran your experiment
– For citation – You have to cite the subset of the data that existed on XXX date when you ran your analysis
– How can you preserve a snapshot of a dataset?
– How do you describe the dataset at that point in time?
– Problems of managing the metadata of each data deposit
– Problems of synchronization across repositories
May 20, 2012
Data Complexity – Provenance
• How can you be certain that the data set you are looking at hasn’t been altered or changed?
• Can you trust the data source?
• Has the data been updated/added to/or appended since the analysis was initially run
• Dealing with issues of abstractionMay 20, 2012 10CSE: Data Standards and Data Citation
Issues
Data Citation
May 20, 2012 CSE: Data Standards and Data Citation Issues
11
CSE: Data Standards and Data Citation Issues
12
We all know what a citation looks like? Right?
May 20, 2012
CSE: Data Standards and Data Citation Issues
13
But what about citing a data set?
May 20, 2012
Source: Citations for SEER Databases
Source: Global Land Cover Facility
Source: International Polar Year
Source: ICPSR
Source: The Economist
CSE: Data Standards and Data Citation Issues
14
CODATA Group on Data Citation• Approved by the CODATA 27th General Assembly
in Cape Town in October, 2010
TASKS– Survey existing literature and existing data citation
initiatives.– Obtain input from stakeholders in library, academic,
publishing, and research communities.– Hold at least one meeting and a workshop to help
establish a solid foundation of the state of the art and practices in this area.
– Work with the ISO and major regional and national standards organizations to develop formal data citation standards and good practices.
May 20, 2012
CSE: Data Standards and Data Citation Issues
15
Issues Requiring Attention
May 20, 2012
CSE: Data Standards and Data Citation Issues
16
Accomplishments so far
• Compiled a significant bibliography of 200+ references
• Conducted a survey for repository managers, publishers, and funding organizations. – More than 60 narrative responses
• Upcoming meeting in Copenhagen in June
• Final meeting in Taipei in October with final report to follow.
May 20, 2012
CSE: Data Standards and Data Citation Issues
17
BRDI Meeting in Berkeley
• Two-day meeting in Berkeley, CA on data citation issues in August 2011
• Covered topics critical to data citation– Administration -- Legal– Publishing -- Policy– Identification -- P&T– Funding support -- Provenance
National Academies report to be published in June
May 20, 2012
What does the future hold?
May 20, 2012 CSE: Data Standards and Data Citation Issues
18
Standards for Data: Work areas?Many potential areas for work in sharing of scientific data
including:• Author/Contributor disambiguation & other issues
• Data Equivalence – How does one know that this thing and that are equivalent (i.e., contain same data)?
• Systemic metadata What is the form of this information? What are its structural components?
• Archival issuesStorage, physical level, but also migration issues
• Bibliographic information for discovery, delivery and reuse
• Rights issues – Ownership, recognition, sharing, privacy
May 20, 2012 19CSE: Data Standards and Data Citation Issues
CSE: Data Standards and Data Citation Issues
20
ORCID – Author disambiguationFounded by CrossRef,
Thomson-Reuters, Nature in 2009
Now 328 participant organizations, 50 of which have provided sponsorship funding
Prototype technologyFull launch in Fall
2012 May 20, 2012
CSE: Data Standards and Data Citation Issues
21
International Standard Name ID• ISO 27729:2012
Information & Documentation -- International Standard Name Identifier
• Another identifier for public identity of parties
• Application for institutions (NISO I2)
• Launched in Spring ‘12
May 20, 2012
CSE: Data Standards and Data Citation Issues
22
Data Equivalence
• Creation of a high-level conceptual model of data description
• A “FRBR” for data• Defines the
distinctions between states & transformations of data
• Basis for identification & description
May 20, 2012
Too many players, doing too many things?
May 20, 2012 23CSE: Data Standards and Data Citation Issues
Source: XKCD
Other Initiatives & Links• CODATA/ICSTI – Task Group on
Data Citation Standards and Practices• DataCite – datacite.org• NISO/NFAIS Group on Supplemental Journal Materials• National Academies Board on Research Data and Information
– For Attribution: Developing Data Attribution and Citation Practices and Standards in Berkely, CA. August 2011 Report due in January
• Cite Datasets and Link to Publications by Digital Curation Centre• ORCID - about.orcid.org• International Standard Name ID – www.isni.org • ScienceCommons - Scholar’s Copyright Project• International Council for Scientific and Technical Information
(ICSTI) – Multimedia Search and Retrieval, and Interactive Journal Articles Projects
• Dublin Core Metadata Initiative - Science and Metadata Community
• DataNet projects funded by NSF, launched in 2009• NISO/OAI ResourceSync project funded by Alfred P. Sloan
Foundation
May 20, 2012 CSE: Data Standards and Data Citation Issues
24
May 20, 2012 CSE: Data Standards and Data Citation Issues
25
Information Standards Quarterly (ISQ)
Summer 2010 Issue of ISQ
Special issue on Enhanced Journal
ArticlesArticle-level Enhancements in
the Humanities and Social Sciences
Hosting Supplemental Materials
Archiving Supplemental Materials
DOIs - Linking and BeyondSupplemental Materials
SurveyReport on NISO/NFAIS project
NOW AVAILABLE OPEN ACCESS
www.niso.org/publications/isq
May 20, 2012 CSE: Data Standards and Data Citation Issues
26
Questions?
Todd CarpenterExecutive Director
National Information Standards Organization (NISO)One North Charles Street, Suite 1905Baltimore, MD 21201 USA+1 (301) 654-2512www.niso.org