1 the dryad data repository: metadata workflows and processes 2nd data management workshop november...
TRANSCRIPT
1
The Dryad Data Repository: Metadata
Workflows and Processes2nd Data Management Workshop
November 28th – 29th 2014University of Cologne, Germany
Jane Greenberg Professor, College of Computing & Informatics (CCI)Director, Metadata Research Center <MRC>Erin Clary, Dryad Curator, CCI/MRC
3
http://datadryad.org/
Pre-populated metadatafield
7
8
Elsevier’s Science Direct: EXAMPLE: Dryad Unmack, et al, Phylogeny and biogeography…Molecular Phylogenetics and Evolution http://dx.doi.org/10.1016/j.ympev.2012.12.019.
Elsevier’s Science Direct: EXAMPLE: Dryad Unmack, et al, Phylogeny and biogeography…Molecular Phylogenetics and Evolution http://dx.doi.org/10.1016/j.ympev.2012.12.019
Data downloads reuse citation
Observations, motivating study of metadata capital1.Metadata generation costs money
2.Metadata reuse is a BIG a BIG part part of Dryad’s workflow3.Metadata reuse via OAI4.Metadata reuse via data sharing, reuse, and repurposing
Download 10678 times
Greenberg J, Swauger S, Feinstein EM (2013) Data from: Metadata capital in a data repository. Proceedings of the International Conference on Dublin Core and Metadata Applications http://dx.doi.org/10.5061/dryad.8c1p6
Journal Re.Wrkfl
Blackout
AmNtrl N NMBE N NBioRisk Y NBMJ Open
Y N
…. Y
Type Total 30 days
Data packages 6867 198
Data files 21056 977
Journals 364 77
Authors 24500 3492
Downloads 639314 36006
• Journals (80+…PLOS): http://datadryad.org/pages/integratedJournals
• X >10GB = $15,$10+
http://wiki.datadryad.org/Sample_Dryad_Content#Examples_by_file_type
TechnologyDSpace DOIs via CDL/DataCiteCC0 (<m> + data)Integration with specialized repositories and databasesFederated searching with TreeBASE and KNB LTERTreeBASE submission (OAI-PMH)GenBank (currently in development)
Governance““non-profit status, 12 non-profit status, 12 member Board of Directors”member Board of Directors”
Sets policy, goals•science, journals, societies, OCLC, MS
2006 Dryad development – NESCent +<MRC>•Stakeholders: journals, publishers and scientific societies, and researchers.
2009-2012: Interim Board
$ PAYMENT-Sept. 1,2014
Sustainability: Plan Comparison
Payment Plan Member Non-member Minimum purchase
1. Voucher Plan USD$65 per data package
USD$70 per data package 25 vouchers
2. Deferred Payment Plan
USD$70 per data package
USD$75 per data package 1 yr contract
3. Subscription Plan
Annual fee based on USD$25 per published research article
Annual fee based on USD$30 per published research article
2 yr contract
For individuals:Pay on acceptance NA
USD$80 per data package, payable by the submitter
1 data package
More on grown and sustainability Membership:
http://datadryad.org/pages/membershipOverview
Pricing and sponsorship of deposits: http://datadryad.org/pages/pricing
Journal integration: http://datadryad.org/pages/
journalIntegration
18
Metadata research & developmentMetadata research & development1.Curation workflow - cognitive walkthroughs2.Dryad metadata scheme development - crosswalk analyses (Dube, et al, 2007; Carrier, et al, 2007; White et al., 2008, Greenberg, et al, 2010; Greenberg 2009; 2010)3.Metadata reuse - content analysis (Greenberg, IDCC Research Summit, 2010) 4.Instantiation - multi-method study (comprehensions assessment) (Greenberg, RDAP, 2010, UNAM 2012)5.Name-authority control - exploratory study (Haven, 2009, INLS 720)6.KO/metadata community practices - Concurrent triangulation mixed methods (survey + simulation experiment) (White, 2010, ASIST, 2010 JLM)7.Metadata functions - quantitative categorical analysis (Willis, Greenberg, and White, 2010, CODATA, 2012, JASIST) 8.Vocabulary needs (HIVE) (HIVE) – mapping study (Greenberg, 2009, CCQ; Scherle, 2010, Code4Lib)9.Metadata theory – deductive analysis (Greenberg, 2009)
Singapore Framework
Dryad DCAP, ver. 3.0bibo (The Bibliographic Ontology)dcterms (Dublin Core terms)dryad (Dryad) DwC (Darwin Core)
Vision1.Simple: automatic metadata gen; heterogeneous datasets *Data-package centric2.Interoperable: harvesting, cross-system searching 3.Semantic Web compatible: sustainable; supporting machine processing
Greenberg, et al, 2009, Metadata Best Practice for a Scientific Data Repository, JLM, DOI:10.1080/19386380903405090.
21
Helping Interdisciplinary Vocabulary Engineering (HIVE)HIVE)
~~~~Amy~~~~Amy
DATADATA
publicationpublication
Package metadata harvested from email
Subj. 177 (gr. 97%, rd. 2%, bl. 1%)
Contr. 101 (gr. 99%, bl. 1%)
Modified Capital-sigma notation
Reuse
nR + ∑ ai = R + a1 + a2 +a3 + …an
i=1R = value of the metadata recordi= number of usagesa = incremental increase in valuen = maximum number of reuse
27
Author/Submitter | Curator
100 metadata instantiations•8 of 12 metadata properties had reuse @ 50% or greater•5 of 8 confirmed reuse at• 80% or higher. •Basic bib. vs. complex
Author
Subject
Dcterms.spatial
DwC.ScientificName
Conclusion…other Valuation Approaches
Market cap of Facebook per user: $40 – $300 Revenues per record per user: $4 – $7 per year
• Facebook• Experian
Market prices of personal data:
• $0.50 for street address• $2.00 for date of birth• $8 for social security number• $3 for driver’s license number• $35 for military record
SOURCE: OECD. Exploring the Economics of Personal Data: A Survey of Methodologies for Measuring Monetary Value. OECD Digital Economy Papers. Office for Economic Cooperation and Development Publishing, 2013.
Concluding comments
Success story Contribution, have to start
somewhere…• Good timing, the right discipline
Confirmed use, reuse Machine capabilities An educative commons, intellectually
engaging
http://wiki.datadryad.org/Sample_Dryad_Content
32
Acknowledgments Dryad Consortium Board, journal partners, and data authors NESCent: Laura Wendell (Executive Director), Hilmar Lapp,
Heather Piwowar, Peggy Schaeffer, Ryan Scherle, Todd Vision (PI)
**Drexel/UNC <Metadata Research Center>: Jose R. Pérez-Agüera, Sarah Carrier, Elena Feinstein, Lina Huang, Robert Losee, Hollie White, Craig Willis, Jane Smith, Shea Swuager, Liz Turner, Christine Mayo, Adrian Ogletree, Erin Clary
U British Columbia: Michael Whitlock NCSU Digital Libraries: Kristin Antelman HIVE: Library of Congress, USGS, and The Getty Research
Institute; and workshop hosts Yale/TreeBASE: Youjun Guo, Bill Piel DataONE: Rebecca Koskela, Bill Michener, Dave Veiglais, and
many others British Library: Lee-Ann Coleman, Adam Farquhar, Brian Hole Oxford University: David Shotton
33
http://datadryad.org http://blog.datadryad.org http://datadryad.org/wiki
http://code.google.com/p/[email protected]
Facebook: Dryad Twitter: @datadryad
http://ils.unc.edu/mrc/hive/ http://code.google.com/p/hive-mrc/
Metsdata Reserch Center: http://cci.drexel.edu/mrc
http://datadryad.org http://blog.datadryad.org http://datadryad.org/wiki
http://code.google.com/p/[email protected]
Facebook: Dryad Twitter: @datadryad
http://ils.unc.edu/mrc/hive/ http://code.google.com/p/hive-mrc/
Metsdata Reserch Center: http://cci.drexel.edu/mrc