ag data commons: adding value to open agricultural research data
TRANSCRIPT
Cynthia Parr @cydparrUS Department of AgricultureNational Agricultural Library30 September 2015
Ag Data Commons Adding value to
open agricultural research data
Federal directives: Public access to open, machine-readable data
The problems in agricultural data
• Broad subject areas• Journals not integrated with repositories like
Dryad• Too many existing databases & web distribution
points• Lack of infrastructure for long-tail data• Lack of a neutral, sustainable solution for long-
term multi-institutional projects
3
• Supports Public Access mandates• Holds agricultural research data• Primary audience: researchers• Holds metadata for data held elsewhere• Starting with USDA data but will broaden• Both human and machine access• Can include unpublished data that is ready
for release
Ag Data Commons Prototyping FY 2015
A proposed solution
AG DATA COMMONSSearch &
Knowledge Discovery
Thesaurus &Indexing
Ag Data CommonsRepository
Organization & Curation
Grant management
systems
INGESTION DISSEMINATION
PubAg
DatasetSubmission
Analytics & Tools
Data.govAg Data
Commons Catalog
LegendBuildingAdaptingExisting
Distributed repositories
Forest ServiceGeospatial
Adding value
6
Metadata + data package
DOILinksThesaurus tags
Idiosyncratic data dictionary
Search, services, compliance checking
DKAN http://nucivic.com/dkan/ PRO• Open source community• Drupal modules for basic
CMS functions • Integrated CKAN catalog• Feeds Data.gov• Basic metadata already
supported
CON• Not designed for scientific
data or scientists• No links to literature• No Digital Object
Identifiers• Doesn’t handle dataset
relationships• Metadata inadequate for
compliance checking & re-use
7
Metadata StandardsCore Metadata Schema
POD 1.1 (Project Open Data)https://project-open-data.cio.gov/
Related Scientific Metadata & Data Standards (e.g.)ISO 19115 (GIS Data, FGDC)https://www.iso.orgDarwin Core (Biodiversity standards)http://rs.tdwg.org/dwc/EML (Ecological Metadata Language)https://knb.ecoinformatics.org/#tools/emlMiXS GSC (Genomic Standards Consortium)http://gensc.org/projects/mixs-gsc-project/
Controlled Vocabularies
• NALT – National Agricultural Library Thesaurus http://agclass.nal.usda.gov
GACS Global Agricultural Concept Scheme
• Taxonomy
• Gene Ontology (GO) http://geneontology.org/
• ENVO, ecological, economic, etc.
Relevant for Agriculture
• Help create a semantic web• SKOS (Simple Knowledge Organization System): W3C
recommendation, or RDF
Credit: AIMS--FAO
https://data.nal.usda.gov/
Launching next week
Adding even more value
12
Structured methods metadata
Shared data dictionary
Semantic data dictionary
Adding even more value
13
Assist application launch
Find related data
Integrate/link related data
= help build the knowledge graph
Acknowledgements
Susan McCarthy, NAL – KSDUrsula Pieper, NAL – ISDQing Qu, NAL – KSD contractor Jeff Campbell – NAL – KSDJaylen Nathwani, NAL – student internNüCivic, Angry Cactus TeamJocelyn McNamara -- NAL – KSD contractorKerry Huller – UMD graduate fellow Erin Antognoli – UMD graduate fellow