bd2k all -hands meeting– november 29 , 2016...• json-ld is primarily intended to be a way to:...
TRANSCRIPT
A Standards-Based Model for Metadata ExchangeBD2K All-Hands Meeting– November 29th, 2016
Martin O’Connor, Marcos Martínez-Romero, Attila L. Egyedi, Debra Willrett, John Graybeal, and Mark A. Musen
Stanford University, Stanford, CA, USA
metadatacenter.org Stanford University
Reproducibility Problem in Science
Metadata Key to Addressing Problem
• Crucial for reproducibility in biomedicine– Locate experimental datasets online– Understand how the experiments were performed– Reuse the data to perform new analyses
• Journals and funding agencies increasingly require making experimental data and metadata available
Many Metadata Standards have been Developed
However: Metadata Submission is Hard
Metadata Submission is Hard - II
Metadata
Summary Data Matrix
Raw Data
Submission Interface
Result: Poor Metadata
Variants of ‘age’ metadata field in Gene Expression Omnibus (GEO) repository
ageAgeAGE`Age
age (after birth)age (in years)
age (y)age (year)age (years)Age (years)Age (Years)
age (yr)age (yr-old)
age (yrs)Age (yrs)
age [y]age [year]age [years]age in years
age of patientAge of patient
age of subjectsage(years)Age(years)Age(yrs.)Age, yearage, years
age, yrsage.year
age_years
Our Solution: CEDAR - A Metadata Ecosystem
• Overcome the impediments to creating high-quality metadata
• Facilitate– Creation– Acquisition– Use– Evaluation– Refinement
• Key goal: create a sharable metadata exchange format – a template model - for publishing, searching, exchanging metadata
CEDAR Template Model Goals• Must describe composite
structure of templates• Implemented using standard
formats• Express semantics• Metadata instances:
– Linked to controlled terms– Easily serializable– Easily validated– Easily indexed– Interchange with RDF– High readable– Produced/consumed via
REST APIs and usable in JavaScript front ends
– Meets FAIR goals
Using JSON Schema and JSON-LD for CEDAR Template Model
JSON Schema + JSON-LD JSON-LD
What is JSON Schema?• Technology for describing and validating the
structure of JSON documents
• Provides a structural description of any JSON document
• JSON documents that are specified with JSON Schema can be structurally validated against their associated schemas
• Analogous to XML Schema
What is JSON-LD?• A lightweight syntax to serialize Linked Data in JSON
• Allows existing JSON to be interpreted as Linked Data with minimal changes
• JSON-LD is primarily intended to be a way to:– use Linked Data in Web-based programming environments– build interoperable Web services– store Linked Data in JSON-based storage engines
• Core contribution: add semantics to JSON documents
• W3C Recommendation: https://www.w3.org/TR/json-ld/
Using JSON Schema to Define Template Structure
{"$schema": "http://json-schema.org/draft-04/schema#","@type": "https://repo.metadatacenter.org/core/Template","@id": "https://repo.metadatacenter.org/templates/434334","title": ”Study","description": ”Study template","type": "object","_ui": {...},"properties": {
"title": {...},”description": {...},”principalInvestigator": {...}
},"required": ["title", "description",
"principalInvestigator"]"additionalProperties": false}
Using JSON-LD to add Semantics to Metadata Instances
{"title": { "@value": "Immune biomarkers study" },"description": { "@value": "Immune biomarkers …" },"principalInvestigator": {
"name": { "@value": "Dr. P.I" },"institution": {
"name": { "@value": "Stanford" },"zip": { "@value": "94305" }
}}
}
Using JSON-LD to add Semantics to Metadata Instances - II
{"@type": "http://semantic-dicom.org/dcm#Study","@id": "https://repo.metadatacenter.org/template_instances/55417","@context": {"title": "https://schema.org/title","name": "https://schema.org/name","description": "https://schema.org/description","zip": "https://schema.org/postalCode","pi": "https://myschema.org/property/hasPI","institution": "https://myschema.org/property/hasInstitution"
},"title": { "@value": "Immune biomarkers study" },"description": { "@value": "Immune biomarkers …" },”principalInvestigator": {"@type": "https://schema.org/Person","@id": "https://repo.metadatacenter.org/template_elements/557","name": { "@value": "Dr. P.I" },"institution": {"@type": "https://schema.org/Organization","@id": "https://repo.metadatacenter.org/template_elements/37","name": { "@value": "Stanford" },"zip": { "@value": "94305" }}
}}
CEDAR Metadata Instances can be transformed to an RDF Graph
Model drives CEDAR Workbench
CEDAR Template Model
Controlled terminologies
More Information• CEDAR Demos:
– CEDAR: Easing Authoring of Metadata to Make Biomedical Data Sets More Findable and Reusable
– Faster and Better Metadata Authoring using CEDAR's Value Recommendations
• Posters:
– SAP – a CEDAR-based pipeline for semantic annotation of biomedical metadata– Increasing NCBO BioPortal and CEDAR Synergy for BD2K– The smartAPI initiative: making Web APIs FAIR
• Collaborator Posters:
– Leveraging the CEDAR Workbench for ontology-linked submission of AIRR data to the NCBI-SRA
– FAIR LINCS Data and Metadata powered by the CEDAR Framework