rda, data citation, and pids for dataone
TRANSCRIPT
Unless otherwise noted, the slides in this presentation are licensed by Mark A. Parsons under a Creative Commons Attribution-Share Alike 3.0 License
Building Collaborative Bridges Opportunities and Challenges for Data Sharing and Citation
Mark A. Parsons0000-0002-7723-0950Secretary General
DataONE Webinar10 May 2016
All of society’s grand challenges require diverse
(often large) data to be shared and integrated
across cultures, scales, and technologies.
Research Data Alliance
Vision Researchers and innovators openly share data across technologies, disciplines, and countries to address the grand challenges of society.
Mission RDA builds the social and technical bridges that enable open sharing of data.
Infrastructure is
Relationships, interactions, and connections between people, technologies, and institutions
FranBerman,ResearchDataAlliance
“Create - Adopt - Use” (in 12-18 months)
Systems Interoperability
Adopted Policy
Sustainable Economics
Common Types, Standards, Metadata
TrafficImage:MikeGonzalez
Adopted Community Practice
Training, Education, Workforce
Shared Principles
• Openness
• Consensus
• Balance
• Harmonization
• Community Driven
• Non-profit
May-July Aug-Oct Nov-Jan Feb-Apr May-July Aug-Oct Nov-Jan Feb-Apr May-July Aug-Oct Nov-Jan Feb-Apr
392
9911274
16562048
24042636
28813126
34343698
3976
SouthAmerica1%
NorthAmerica34%
Europe48%
Australasia5%
Asia9%
Africa3%
from 110 countries
https://rd-alliance.org/about-rda/who-rda.html
TheRDACommunity:~4,000+membersfrom110countries
(April2016)
70+ Working and Interest Groups
RDA Organisational Members
Organisational & Affiliate Members
RDA Affiliate Members
https://rd-alliance.org/organisation/rda-organisation-affiliate-members.html
FranBerman,ResearchDataAlliance
RDA: Accelerate Data Sharing and Interoperability Across Cultures, Communities, Scales, Technologies
▪ Technicalpartsofthedataengine:▪ Datatyperegistriesreferencemodel▪ Wheatdatainteroperabilityframework
▪ Rulesoftheroad:▪ Commonagreementondatacitation▪ Commonpracticefordatarepositories▪ Principlesoflegalinteroperability
▪ Betterdrivers• Summerschoolsindatascienceandcloud
computinginthedevelopingworld(withCODATA)
• Activedatamanagementplandevelopmentandmonitoring
Policy and Practice
Systems Interoperability
Sustainable Economics
Common Types, Standards, Metadata
Training, Education, Workforce
Some themes amidst the difference
1. Persistent Identifiers for data, documents, people, organisations, instruments—Everything!
2. Certifying Trust in assertions, evidence, organisations, processes…
3. The value of Conversations, Relationships, and Mediation — an agile network effect.
‹#›An Area of Convergence and Agreement
Internet Domain
nodes with IP numbers
packages being exchanged
standardized protocols
Slide courtesy P. Wittenberg from L. Lannom from D. Clark
‹#›An Area of Convergence and Agreement
Internet Domain
nodes with IP numbers
packages being exchanged
standardized protocols
Data Domain
objects with PID numbers
objects being exchanged
standardized protocols
Slide courtesy P. Wittenberg from L. Lannom from D. Clark
Purpose of Data Citation
• Aid scientific reproducibility through direct, unambiguous connection to the precise data used.
• Credit for data authors and stewards • Accountability for creators and stewards • Track impact of data set • Help identify data use (e.g., trackbacks)
• Data authors can verify how their data are being used. • Users can better understand the application of the data.
• A locator/reference mechanism not a discovery mechanism per se
Crisis of Confidence in Research Data Citation
The Evolution of Data Citation
• Data was part of the literature—tables, maps, monographs, etc.—and we cited accordingly. (Some data were still hoarded).
• Digital data becomes the norm. It’s messier and we forget how to do cite it routinely.
• Initial efforts to define digital data citation in the 90s - early 00s • Right idea, little traction • Partially conflated with the citing URLs issue
• A blossoming in the mid-late 00s. • Multiple disciplines start developing approaches and guidelines • DOI a big driver, especially for DataCite, but other identifiers used too
(Handles, LSIDs, UNFs, ARKs and good ol’ URI/Ls) • A somewhat competitive atmosphere
• Finally consensus through the Joint Declaration of Data Citation Principles, 2013
JointDeclarationofDataCitationPrinciples(Overview)
TheNobleEight-FoldPathtoCitingData
1. Importance2. Creditandattribution3. Evidence4. UniqueIdentification5. Access6. Persistence7. Specificityandverifiability8. Interoperabilityandflexibility
Principlesaresupplementedwithaglossary,referencesandexampleshttp://force11.org/datacitation
‹#›Citing Dynamic Data
Data Citation: Data + Means-of-access
▪ Data à time-stamped & versioned (aka history)
Researcher creates working-set via some interface: ▪ Access à assign PID to QUERY, enhanced with − Time-stamping for re-execution against versioned DB − Re-writing for normalization, unique-sort, mapping to history − Hashing result-set: verifying identity/correctness
leading to landing page
S. Pröll, A. Rauber. Scalable Data Citation in Dynamic Large Databases: Model and Reference Implementation. In IEEE Intl. Conf. on Big Data 2013 (IEEE BigData2013), 2013http://www.ifs.tuwien.ac.at/~andi/publications/pdf/pro_ieeebigdata13.pdf
‹#›
Output / Results http://bit.ly/1T1HHXI
▪ 14 Recommendationsgrouped into 4 phases: - Preparing data and query store - Persistently identifying specific data
sets - Resolving PIDs - Upon modifications to the data
infrastructure ▪ Still open for comment by
members ▪ See RDA Magazine for
overview and adoption cases ▪ Reference implementations
(SQL, CSV, XML) ▪ Pilots
Getting involved
Individuals✓Observers✓Contributors✓Drivers
22
Organisations✓ Insight✓ Adopt✓ Drive
Nationallevel✓ Coordination&Knowledge
Exchange,Strategy&/orImplementation
• Members• WGs-IGs-BoFs• Requestsfor
Comments• Plenaries
• Member• WGs-IGs-BoFs• RfCs• Fundedprojects• Adoption/Uptake
• Papers&Events• Meetings&Fora• Training&Workshops• Uptakepilots
https://rd-alliance.org/get-involved.html
12-16 September 2016in
Denver, Colorado, USA
25RDA Interest (IG) and Working Groups (WG) by Focus 1 — February 2016
26RDA Interest (IG) and Working Groups (WG) by Focus 2 — February 2016