bd2k all -hands meeting– november 29 , 2016...• json-ld is primarily intended to be a way to:...

18
A Standards-Based Model for Metadata Exchange BD2K All-Hands Meeting– November 29 th , 2016 Martin O’Connor, Marcos Martínez-Romero, Attila L. Egyedi, Debra Willrett, John Graybeal, and Mark A. Musen Stanford University, Stanford, CA, USA metadatacenter.org Stanford University

Upload: others

Post on 01-Oct-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: BD2K All -Hands Meeting– November 29 , 2016...• JSON-LD is primarily intended to be a way to: – use Linked Data in Web- based programming environments – build interoperable

A Standards-Based Model for Metadata ExchangeBD2K All-Hands Meeting– November 29th, 2016

Martin O’Connor, Marcos Martínez-Romero, Attila L. Egyedi, Debra Willrett, John Graybeal, and Mark A. Musen

Stanford University, Stanford, CA, USA

metadatacenter.org Stanford University

Page 2: BD2K All -Hands Meeting– November 29 , 2016...• JSON-LD is primarily intended to be a way to: – use Linked Data in Web- based programming environments – build interoperable

Reproducibility Problem in Science

Page 3: BD2K All -Hands Meeting– November 29 , 2016...• JSON-LD is primarily intended to be a way to: – use Linked Data in Web- based programming environments – build interoperable

Metadata Key to Addressing Problem

• Crucial for reproducibility in biomedicine– Locate experimental datasets online– Understand how the experiments were performed– Reuse the data to perform new analyses

• Journals and funding agencies increasingly require making experimental data and metadata available

Page 4: BD2K All -Hands Meeting– November 29 , 2016...• JSON-LD is primarily intended to be a way to: – use Linked Data in Web- based programming environments – build interoperable

Many Metadata Standards have been Developed

Page 5: BD2K All -Hands Meeting– November 29 , 2016...• JSON-LD is primarily intended to be a way to: – use Linked Data in Web- based programming environments – build interoperable

However: Metadata Submission is Hard

Page 6: BD2K All -Hands Meeting– November 29 , 2016...• JSON-LD is primarily intended to be a way to: – use Linked Data in Web- based programming environments – build interoperable

Metadata Submission is Hard - II

Metadata

Summary Data Matrix

Raw Data

Submission Interface

Page 7: BD2K All -Hands Meeting– November 29 , 2016...• JSON-LD is primarily intended to be a way to: – use Linked Data in Web- based programming environments – build interoperable

Result: Poor Metadata

Variants of ‘age’ metadata field in Gene Expression Omnibus (GEO) repository

ageAgeAGE`Age

age (after birth)age (in years)

age (y)age (year)age (years)Age (years)Age (Years)

age (yr)age (yr-old)

age (yrs)Age (yrs)

age [y]age [year]age [years]age in years

age of patientAge of patient

age of subjectsage(years)Age(years)Age(yrs.)Age, yearage, years

age, yrsage.year

age_years

Page 8: BD2K All -Hands Meeting– November 29 , 2016...• JSON-LD is primarily intended to be a way to: – use Linked Data in Web- based programming environments – build interoperable

Our Solution: CEDAR - A Metadata Ecosystem

• Overcome the impediments to creating high-quality metadata

• Facilitate– Creation– Acquisition– Use– Evaluation– Refinement

• Key goal: create a sharable metadata exchange format – a template model - for publishing, searching, exchanging metadata

Page 9: BD2K All -Hands Meeting– November 29 , 2016...• JSON-LD is primarily intended to be a way to: – use Linked Data in Web- based programming environments – build interoperable

CEDAR Template Model Goals• Must describe composite

structure of templates• Implemented using standard

formats• Express semantics• Metadata instances:

– Linked to controlled terms– Easily serializable– Easily validated– Easily indexed– Interchange with RDF– High readable– Produced/consumed via

REST APIs and usable in JavaScript front ends

– Meets FAIR goals

Page 10: BD2K All -Hands Meeting– November 29 , 2016...• JSON-LD is primarily intended to be a way to: – use Linked Data in Web- based programming environments – build interoperable

Using JSON Schema and JSON-LD for CEDAR Template Model

JSON Schema + JSON-LD JSON-LD

Page 11: BD2K All -Hands Meeting– November 29 , 2016...• JSON-LD is primarily intended to be a way to: – use Linked Data in Web- based programming environments – build interoperable

What is JSON Schema?• Technology for describing and validating the

structure of JSON documents

• Provides a structural description of any JSON document

• JSON documents that are specified with JSON Schema can be structurally validated against their associated schemas

• Analogous to XML Schema

Page 12: BD2K All -Hands Meeting– November 29 , 2016...• JSON-LD is primarily intended to be a way to: – use Linked Data in Web- based programming environments – build interoperable

What is JSON-LD?• A lightweight syntax to serialize Linked Data in JSON

• Allows existing JSON to be interpreted as Linked Data with minimal changes

• JSON-LD is primarily intended to be a way to:– use Linked Data in Web-based programming environments– build interoperable Web services– store Linked Data in JSON-based storage engines

• Core contribution: add semantics to JSON documents

• W3C Recommendation: https://www.w3.org/TR/json-ld/

Page 13: BD2K All -Hands Meeting– November 29 , 2016...• JSON-LD is primarily intended to be a way to: – use Linked Data in Web- based programming environments – build interoperable

Using JSON Schema to Define Template Structure

{"$schema": "http://json-schema.org/draft-04/schema#","@type": "https://repo.metadatacenter.org/core/Template","@id": "https://repo.metadatacenter.org/templates/434334","title": ”Study","description": ”Study template","type": "object","_ui": {...},"properties": {

"title": {...},”description": {...},”principalInvestigator": {...}

},"required": ["title", "description",

"principalInvestigator"]"additionalProperties": false}

Page 14: BD2K All -Hands Meeting– November 29 , 2016...• JSON-LD is primarily intended to be a way to: – use Linked Data in Web- based programming environments – build interoperable

Using JSON-LD to add Semantics to Metadata Instances

{"title": { "@value": "Immune biomarkers study" },"description": { "@value": "Immune biomarkers …" },"principalInvestigator": {

"name": { "@value": "Dr. P.I" },"institution": {

"name": { "@value": "Stanford" },"zip": { "@value": "94305" }

}}

}

Page 15: BD2K All -Hands Meeting– November 29 , 2016...• JSON-LD is primarily intended to be a way to: – use Linked Data in Web- based programming environments – build interoperable

Using JSON-LD to add Semantics to Metadata Instances - II

{"@type": "http://semantic-dicom.org/dcm#Study","@id": "https://repo.metadatacenter.org/template_instances/55417","@context": {"title": "https://schema.org/title","name": "https://schema.org/name","description": "https://schema.org/description","zip": "https://schema.org/postalCode","pi": "https://myschema.org/property/hasPI","institution": "https://myschema.org/property/hasInstitution"

},"title": { "@value": "Immune biomarkers study" },"description": { "@value": "Immune biomarkers …" },”principalInvestigator": {"@type": "https://schema.org/Person","@id": "https://repo.metadatacenter.org/template_elements/557","name": { "@value": "Dr. P.I" },"institution": {"@type": "https://schema.org/Organization","@id": "https://repo.metadatacenter.org/template_elements/37","name": { "@value": "Stanford" },"zip": { "@value": "94305" }}

}}

Page 16: BD2K All -Hands Meeting– November 29 , 2016...• JSON-LD is primarily intended to be a way to: – use Linked Data in Web- based programming environments – build interoperable

CEDAR Metadata Instances can be transformed to an RDF Graph

Page 17: BD2K All -Hands Meeting– November 29 , 2016...• JSON-LD is primarily intended to be a way to: – use Linked Data in Web- based programming environments – build interoperable

Model drives CEDAR Workbench

CEDAR Template Model

Controlled terminologies

Page 18: BD2K All -Hands Meeting– November 29 , 2016...• JSON-LD is primarily intended to be a way to: – use Linked Data in Web- based programming environments – build interoperable

More Information• CEDAR Demos:

– CEDAR: Easing Authoring of Metadata to Make Biomedical Data Sets More Findable and Reusable

– Faster and Better Metadata Authoring using CEDAR's Value Recommendations

• Posters:

– SAP – a CEDAR-based pipeline for semantic annotation of biomedical metadata– Increasing NCBO BioPortal and CEDAR Synergy for BD2K– The smartAPI initiative: making Web APIs FAIR

• Collaborator Posters:

– Leveraging the CEDAR Workbench for ontology-linked submission of AIRR data to the NCBI-SRA

– FAIR LINCS Data and Metadata powered by the CEDAR Framework