summary of data citation synthesis activity & review
TRANSCRIPT
Prepared for
Data Citation Synthesis GroupOpen Workshop
s
Sept 2013
Summary of data citation synthesis activity &
Next steps for review <bit.ly/dsynthrev>
Dr. Micah Altman<[email protected]>
MIT Libraries
Joan Starr
[email protected] Digital Library
Summary of synthesis activity & Next steps for review
What has been done?
Summary of synthesis activity & Next steps for review
Refining Approaches to Data Citation
2000-2004
NESSTAR, Virtual Data Center
Cite research data in publications; Use
persistent identifiers; Facilitate direct access to data through URI’s
[Ryssevik & Musgrave 2001][Altman, et al. 2001]
2005-2009
Dataverse Network System, TIB Data DOI Registration
Include versioning, fixity, and granularity for verification; use
permanent institutions; facilitate
attribution
[Buhneman 2006][Altman & King 2007]
[OECD 2009]
2010-DataCite;
Thomson-Reuters Data Citation
Index; FigShare; Data Dryad
Include data citations in standard locations; index data citations in catalogs; facilitate
machine understanding
[NAS 2012][DCC 2012]
[Force 11 2013][CODATA 2013]
Example Systems Core Recommendations
Key References
Summary of synthesis activity & Next steps for review
Integrating Current Recommendation
Disciplinary Practices; Repository Practices
Summative Recommendations
Synthesis Principles
Summary of synthesis activity & Next steps for review
Synthesis Group Activity• Hosted by Force 11
– Charter here: http://www.force11.org/node/4432• Formed early summer• Meeting weekly• Reviewed current key recommendations
& engaged lead authors: – Force 11/Amsterdam Manifesto [FORCE11 2012]– Co-Data/”Out of Cite” Recommendations [CODATA 2013]– DCC Guide [DCC 2012]– DataCite/Metadata Core [Datacite 2012]– Research Data Alliance
• Identified core principles that are consistent across recommendation groups• Formulated a draft synthesis of principles• Agreed to use key documents above for definitions of terms, detailed
explanation of issues• Out of scope: specific detailed standards, protocols, infrastructure, tools
Summary of synthesis activity & Next steps for review
Yesterday
• Open Workshop• Line-by-line review of draft• Open editing of document
– In shared document– Using revision control
• Convergence on principles– 8 principles revised and approved by consensus– 1 recommendation struck– 1 recommendation tabled for discussion today
• Summary – Substantial core of agreement need for citation; use of persistent identifiers; support
for human and machine access; facilitation of verification, attribution.– Maintain conceptual boundaries among data citation; publication & evaluation– Recognize that terminology cannot always be aligned with colloquial or disciplinary
usage
Summary of synthesis activity & Next steps for review
The principles
1. Importance2. Credit and attribution3. Evidence4. Unique Identification5. Access6. Persistence7. Versioning and granularity8. Interoperability and flexibility
Summary of synthesis activity & Next steps for review
Open Question: Data Repository Recommendations
6. PersistenceMetadata describing the data, and unique identifiers should persist, even beyond the lifespan of the data they describe.
Data citations should be resolvable to data stored in repositories with a commitment and demonstrated capability to maintain long term access. Data stored in such repositories may not always be publicly accessible. Although such repositories should be committed to long term maintenance and preservation of data, the nature of digital data is such that they may not persist indefinitely.
Summary of synthesis activity & Next steps for review
Review Process
• Synthesis group will supplement today’s consensus principles with background:– Illustrative examples for each recommendation– References with each principle to detailed discussion of
embedded issues in prior reports. – Glossary.
• Public release of draft for open online commentary• Integration of commentary and release of final draft
Summary of synthesis activity & Next steps for review
Questions for Review & Decisions• Nomination of additional members to synthesis group for preparation of
summary material (glossary, references, example, preamble)? – Decision: anyone in attendance who can substantively (if not officially) represent a
group – Decision: Identify additional key organizations for commentary,
• Public release of draft – when, to whom?– Decision: Available for open public commentary mid November– Decision: Will specifically request comments from key organizations, including:
• Organizations listed earlier ( Force11, DCC, CoData, ESIP, RDA, DataVerse, Data-PASS, DataCite)• Additional suggested organizations: NLM, ARL• Additional organization identified by synthesis group
• Open commentary via mailing list & force11 website. Period for commentary?– Decision: 6-8 weeks for public commentary
• Integration of commentary by synthesis group and release of updated draft. Number of drafts necessary? When to declare “done”?– Decision: Single round of revisions by synthesis group. Will then seek endorsements.
Summary of synthesis activity & Next steps for review
Additional References• [Ryssevik & Musgrave 2001]
J Ryssevik , S. Musgrave. 2001. The Social Science Dream MachineSocial Science Computer Review [Altman, et al. 2001] M. Altman, et al. 2001. A Digital Library for the Dissemination and Replication of Quantitative Social Science Research: The Virtual Data Center, Social Science Computer Review
• [Buhneman 2006]P. Buhneman 2006. How to Cite Curated Databases and Make them CitableSSDBM ’06
• [Altman & King 2007]M. Altman & G. King, 2007. A Proposed Standard for the Scholarly Citation of Quantitative Data, D-Lib
• [OECD 2009]T. Green. 2009, We need publishing standards for datasets and data tables. OECD.
• [NAS 2012]P. Uhlir (ed.),2011. For Attribution -- Developing Data Attribution and Citation Practices and Standards. National Academies of Sciences.
Synthesis Group Contacts
About the synthesis group: http://www.force11.org/node/4432
Questions for the synthesis group: [email protected]
Consensus document, with revision history:https://docs.google.com/document/d/1KosNqBPgE8ziWDuJgBIrk20KxcOXoZdAt_TdJV3xoz8/edit?usp=drive_web
Summary of synthesis activity & Next steps for review
Summary of synthesis activity & Next steps for review
Key Recommendations• [[Force11 2013]
M. Crosas, T. Carptenter, C. Borgman, D. Shotton 2013, The Amsterdam Manifesto on Data Citation Principles, Force11
• [CODATA 2013]CODATA-ICSTI Task Group on Data Citation, 2013; Out of Cite, Out of Mind: The Current State of Practice, Policy, and Technology for the Citation of Data. Data Science Journal
• [DCC 2012]Ball, A., Duke, M. (2012). ‘Data Citation and Linking’. DCC Briefing Papers. Edinburgh: Digital Curation Centre.
Summary of synthesis activity & Next steps for review
Additional References• [Ryssevik & Musgrave 2001]
J Ryssevik , S. Musgrave. 2001. The Social Science Dream MachineSocial Science Computer Review [Altman, et al. 2001] M. Altman, et al. 2001. A Digital Library for the Dissemination and Replication of Quantitative Social Science Research: The Virtual Data Center, Social Science Computer Review
• [Buhneman 2006]P. Buhneman 2006. How to Cite Curated Databases and Make them CitableSSDBM ’06
• [Altman & King 2007]M. Altman & G. King, 2007. A Proposed Standard for the Scholarly Citation of Quantitative Data, D-Lib
• [OECD 2009]T. Green. 2009, We need publishing standards for datasets and data tables. OECD.
• [NAS 2012]P. Uhlir (ed.),2011. For Attribution -- Developing Data Attribution and Citation Practices and Standards. National Academies of Sciences.
• [Datacite 2012]Datacite metadata schema, v 3.0 http://schema.datacite.org/
Summary of synthesis activity & Next steps for review
Appendix: Principles
Summary of synthesis activity & Next steps for review
The principles
1. ImportanceData should be considered legitimate, citable products of research. Data citations should be accorded the same importance in the scholarly record as citations of other research objects, such as publications.
Summary of synthesis activity & Next steps for review
The principles
2. Credit and attributionData citations should facilitate giving scholarly credit and normative and legal attribution to all contributors to the data, recognizing that a single style or mechanism of attribution may not be applicable to all data.
3. EvidenceWhere a specific claim rests upon data, the corresponding data citation should be provided.
Summary of synthesis activity & Next steps for review
The principles
4. Unique identificationA data citation should include a persistent method for identification that is machine actionable, globally unique, and widely used by a community.
5. AccessData citations should facilitate access to the data themselves and to such associated metadata, documentation, and other materials, as are necessary for both humans and machines to make informed use of the referenced data.
Summary of synthesis activity & Next steps for review
The principles
6. PersistenceMetadata describing the data, and unique identifiers should persist, even beyond the lifespan of the data they describe.
[more to be decided upon]
Summary of synthesis activity & Next steps for review
The principles
7. Versioning and granularityData citations should facilitate identification and access to different versions and/or subsets of data. Citations should include sufficient detail to verifiably link the citing work to the portion and version of data cited.
8. Interoperability and flexibilityData citation methods should be sufficiently flexible to accommodate the variant practices among communities but should not differ so much that they compromise interoperability of data citation practices across communities.