the elixir of linked data - open · pdf filethe elixir of linked data ... (and elasticsearch)...

35
European Life Sciences Infrastructure for Biological Information www.elixir-europe.org The ELIXIR of Linked Data Professor Carole Goble (UK node) Barend Mons (NL node) , Helen Parkinson (EMBL-EBI node) The Interoperability Services Backbone Team

Upload: dangnhan

Post on 09-Mar-2018

228 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: The ELIXIR of Linked Data - Open  · PDF fileThe ELIXIR of Linked Data ... (and Elasticsearch) Search, Index and Linked Data. Biological ... Getting data providers to generate LOD

European Life Sciences Infrastructure for Biological Information

www.elixir-europe.org

The ELIXIR of Linked DataProfessor Carole Goble (UK node)

Barend Mons (NL node) , Helen Parkinson (EMBL-EBI node)

The Interoperability Services Backbone Team

Page 2: The ELIXIR of Linked Data - Open  · PDF fileThe ELIXIR of Linked Data ... (and Elasticsearch) Search, Index and Linked Data. Biological ... Getting data providers to generate LOD

What is ELIXIR?

An international distributed infrastructure for life-science information

orchestrate the collection, quality control and archiving of biological data produced

by life science experiments.

integrate research data

ensure a seamless service provision that is easily accessible to all.

http://www.elixir-europe.org/about

Page 3: The ELIXIR of Linked Data - Open  · PDF fileThe ELIXIR of Linked Data ... (and Elasticsearch) Search, Index and Linked Data. Biological ... Getting data providers to generate LOD

ELIXIR: An international distributed infrastructure for biological data

Hub

major bioinformaticsservice providers (~130) 16 ELIXIR members

4 observers

Page 4: The ELIXIR of Linked Data - Open  · PDF fileThe ELIXIR of Linked Data ... (and Elasticsearch) Search, Index and Linked Data. Biological ... Getting data providers to generate LOD

Drivers: Infrastructure Providers

COordinated Research Infrastructures Building Enduring Life-science Services

Marine metagenomics

Human data

Crop and forest plants

Rare diseases

Page 5: The ELIXIR of Linked Data - Open  · PDF fileThe ELIXIR of Linked Data ... (and Elasticsearch) Search, Index and Linked Data. Biological ... Getting data providers to generate LOD

Rare diseases

Genomic data

(WES, WGS)

Other omics data

(transcriptomics,

metabolomics,

proteomics …)

Sample data

(biobank

databases)

Clinical data

(registries, and

phenotypic databases)

1000 exomes1000 exomes

+ > 2500 from other projects

Page 6: The ELIXIR of Linked Data - Open  · PDF fileThe ELIXIR of Linked Data ... (and Elasticsearch) Search, Index and Linked Data. Biological ... Getting data providers to generate LOD

Drug prioritization for Huntington’s DiseaseKaterina Nosikova, Elizaveta Besedina, Eelke van der Horst, Peter-Bram ‘t Hoen, Marco Roos, Eleni Mina, Human Genetics department, LUMC, NL

8

Select

genes by

phenotype

matching

in Monarch

Select drug

compounds in

Open PHACTS

Filter on

feasibility for

treating HD

Prioritized

drug

compounds

Page 7: The ELIXIR of Linked Data - Open  · PDF fileThe ELIXIR of Linked Data ... (and Elasticsearch) Search, Index and Linked Data. Biological ... Getting data providers to generate LOD

What is ELIXIR?

Page 8: The ELIXIR of Linked Data - Open  · PDF fileThe ELIXIR of Linked Data ... (and Elasticsearch) Search, Index and Linked Data. Biological ... Getting data providers to generate LOD

Technical platforms

Data

Tools

Compute

Training

Secure and deliver core data resources

Discoverable tools, services and connectors for data access and exploitation

Robust technical platforms and clouds for secure data access, data exchange and compute

Training programme for professionals, bridging the computational biology skills gap

Standards Data management, reuse and integration

Findable Accessible Interoperable Reusable

Page 9: The ELIXIR of Linked Data - Open  · PDF fileThe ELIXIR of Linked Data ... (and Elasticsearch) Search, Index and Linked Data. Biological ... Getting data providers to generate LOD

Training: BYODs, data wrangling, governance and quality assurance

Linked Data experts,

data experts from

MycoBase and

Human Protein Atlas

http://www.macs.hw.ac.uk/~ajg33/first-byod-workshop/

Tomato genome, phenotypic

observations, variants

Page 10: The ELIXIR of Linked Data - Open  · PDF fileThe ELIXIR of Linked Data ... (and Elasticsearch) Search, Index and Linked Data. Biological ... Getting data providers to generate LOD

ImpactScientific focus

Indicators

Scientific

impact

Community

Legal &funding

infrastructure

Quality

Data: Basket of indicators, reflecting the multiple facets of bioinformatics resources

1) Scientific focus and quality of sciencee.g. curational effort, benchmarking

2) Community served by the resourcee.g. web statistics

3) Quality of servicee.g. uptime, user support and training

4) Legal and funding infrastructuree.g. institutional support, use policy

5) Impact and translational stories

Mandatory and optional

Page 11: The ELIXIR of Linked Data - Open  · PDF fileThe ELIXIR of Linked Data ... (and Elasticsearch) Search, Index and Linked Data. Biological ... Getting data providers to generate LOD
Page 12: The ELIXIR of Linked Data - Open  · PDF fileThe ELIXIR of Linked Data ... (and Elasticsearch) Search, Index and Linked Data. Biological ... Getting data providers to generate LOD

Compute Platform: Authentication, Archiving and Movement

Page 13: The ELIXIR of Linked Data - Open  · PDF fileThe ELIXIR of Linked Data ... (and Elasticsearch) Search, Index and Linked Data. Biological ... Getting data providers to generate LOD

Tools Interoperability and APIs

Describing Tools

EDAM Ontology

Describing Workflows

Common format for bioinformatics tool execution

http://commonwl.org/

Rich: Linked Data allows for infinite metadata annotations and reasoning

SWAGGER.json

Describing APIs

API changes Semantic versioningGetting resources to have APIs

Page 14: The ELIXIR of Linked Data - Open  · PDF fileThe ELIXIR of Linked Data ... (and Elasticsearch) Search, Index and Linked Data. Biological ... Getting data providers to generate LOD

[Luiz Olavo Bonino, DTL] RD-CONNECT, ODEXA4ALL

A FAIRifying Architecture

Page 15: The ELIXIR of Linked Data - Open  · PDF fileThe ELIXIR of Linked Data ... (and Elasticsearch) Search, Index and Linked Data. Biological ... Getting data providers to generate LOD

Warehouses

Preparing SourcesOn boardingDatasets, Content, API

Access fromIntegratingFrameworks

InteroperabilityServices:Identifiers, Ontologies, Schemas.

API

FAIR Interoperability Backbone ServicesPrepare for interop

Page 16: The ELIXIR of Linked Data - Open  · PDF fileThe ELIXIR of Linked Data ... (and Elasticsearch) Search, Index and Linked Data. Biological ... Getting data providers to generate LOD

• Various species: maize,

pine, potato…

• Various data types: from

genomes (sequences and annotations) to phenomes (traits)

• Various ontologies: Crop

Ontology, Plant Ontology…

• Emerging standards: MIAPPE (Minimum Information on Plant Phenotyping Experiment)

Need for infrastructureo Manage identifiers o Register/access

services and data sets

o Metadata driven search

© Paul Kersey

Crop and forest plants

Page 17: The ELIXIR of Linked Data - Open  · PDF fileThe ELIXIR of Linked Data ... (and Elasticsearch) Search, Index and Linked Data. Biological ... Getting data providers to generate LOD

Ontology ServicesOntology mappingData-Ontology Tools

OLS3

Page 18: The ELIXIR of Linked Data - Open  · PDF fileThe ELIXIR of Linked Data ... (and Elasticsearch) Search, Index and Linked Data. Biological ... Getting data providers to generate LOD

Identifiers – the pivot of everything!

Identifier Mapping Service (IMS)

Identifier Resolution Service (IRS2)

Page 19: The ELIXIR of Linked Data - Open  · PDF fileThe ELIXIR of Linked Data ... (and Elasticsearch) Search, Index and Linked Data. Biological ... Getting data providers to generate LOD

FAIR Metadata at many levels

Tool that provisioned the dataset

Dataset Collection

Dataset Profile

Data recordcontent

mappingsbetween entities

mappingsbetween datasets

Interface API and Access

Tool using the dataset

Page 20: The ELIXIR of Linked Data - Open  · PDF fileThe ELIXIR of Linked Data ... (and Elasticsearch) Search, Index and Linked Data. Biological ... Getting data providers to generate LOD

What is ELIXIR?

Page 21: The ELIXIR of Linked Data - Open  · PDF fileThe ELIXIR of Linked Data ... (and Elasticsearch) Search, Index and Linked Data. Biological ... Getting data providers to generate LOD

Metadata Profiles and Dataset RegistrationGovernance, Compliance, Release Protocols

Dataset Profile

Page 22: The ELIXIR of Linked Data - Open  · PDF fileThe ELIXIR of Linked Data ... (and Elasticsearch) Search, Index and Linked Data. Biological ... Getting data providers to generate LOD

DataDiabetic nephropathy (EFO_0000401)

Data

BioSolr

(and Elasticsearch)

Search, Index and Linked Data

Page 23: The ELIXIR of Linked Data - Open  · PDF fileThe ELIXIR of Linked Data ... (and Elasticsearch) Search, Index and Linked Data. Biological ... Getting data providers to generate LOD

Biological knowledge bases

Curated and annotated biological entities and their

relationships

Uniprot, Ensembl, ChEMBL, Orphanet

Two tiers of data repository

Page 24: The ELIXIR of Linked Data - Open  · PDF fileThe ELIXIR of Linked Data ... (and Elasticsearch) Search, Index and Linked Data. Biological ... Getting data providers to generate LOD

Two tiers of data repository

Biological knowledge bases

Curated and annotated biological entities and their

relationships

Uniprot, Ensembl, ChEMBL, Orphanet

data records are dynamic and incomplete

records update, diverge, merge

over time, interpretation

changes

identifier resolution varies over time –

relationships between records are

unstable

“reproducibility” potentially

compromised

a novel gene-rare disease relationship is reported

consequences of a single nucleotide change in a regulatory genomic region is better understood.

Page 25: The ELIXIR of Linked Data - Open  · PDF fileThe ELIXIR of Linked Data ... (and Elasticsearch) Search, Index and Linked Data. Biological ... Getting data providers to generate LOD

Legacy of Open PHACTS. Mappings are first class.

Data recordcontent

mappingsbetween entities

linksets

provenance, versioning, mappinglinksets

Page 26: The ELIXIR of Linked Data - Open  · PDF fileThe ELIXIR of Linked Data ... (and Elasticsearch) Search, Index and Linked Data. Biological ... Getting data providers to generate LOD

VoID – Vocabulary of Interlinked Datasets

• Create description of a Linkset that connects two datasets.

• Select datasets from existing descriptions.

• Capture link predicate and justification

Page 27: The ELIXIR of Linked Data - Open  · PDF fileThe ELIXIR of Linked Data ... (and Elasticsearch) Search, Index and Linked Data. Biological ... Getting data providers to generate LOD

Legacy of Open PHACTS.Releasing Data Sets: Software-Like Research ObjectsLinked Data Manifests

“Publishing data the software way”

Controlled data Distribution

ContainersBuilds

DependenciesVersioningVerification

data-maven-plugin

Docker

Page 28: The ELIXIR of Linked Data - Open  · PDF fileThe ELIXIR of Linked Data ... (and Elasticsearch) Search, Index and Linked Data. Biological ... Getting data providers to generate LOD

Genotype-Phenotype

Genotype-Phenotype

Deans AR, Lewis SE, Huala E, Anzaldo SS, Ashburner M, et al. (2015) Finding Our Way through Phenotypes. PLoS Biol 13(1):

e1002033. doi:10.1371/journal.pbio.1002033

http://journals.plos.org/plosbiology/article?id=info:doi/10.1371/journal.pbio.1002033

Mapping terms

Cross linking datasets

Tracking provenance

Linked Data Services

Page 29: The ELIXIR of Linked Data - Open  · PDF fileThe ELIXIR of Linked Data ... (and Elasticsearch) Search, Index and Linked Data. Biological ... Getting data providers to generate LOD

Publishing FAIR Data

Interoperating Applications

InteroperabilityBackbone

Interoperability Services Backbone

Page 30: The ELIXIR of Linked Data - Open  · PDF fileThe ELIXIR of Linked Data ... (and Elasticsearch) Search, Index and Linked Data. Biological ... Getting data providers to generate LOD

Linked Data – Big Picture• lower the barriers to linking data

• connect related data that wasn't previously linked

• self-describe and annotate data in a common, machine readable form

• expose linking as a first class information element

“a term used to describe a recommended best practice for exposing, sharing, and connecting pieces of data, information, and knowledge on the Semantic Web using URIs and RDF.“ Wikipedia

Page 31: The ELIXIR of Linked Data - Open  · PDF fileThe ELIXIR of Linked Data ... (and Elasticsearch) Search, Index and Linked Data. Biological ... Getting data providers to generate LOD

Impact of Open PHACTS on ELIXIR Linked Data

Components & Know-how

• Identifiers & Links

• Annotation & Ontologies

• Dataset Containers

• Integrate into off the shelf apps

Publishing and Consuming

• Metadata & Mappings

• On boarding & Release pipelines

• APIs, Search

Data …….when it supports interoperability….retain native forms ….preparation and maintenance….data governance…..

Page 32: The ELIXIR of Linked Data - Open  · PDF fileThe ELIXIR of Linked Data ... (and Elasticsearch) Search, Index and Linked Data. Biological ... Getting data providers to generate LOD

Challenges of Linked Data

Getting data providers to generate LOD

Getting agreement on URIs

Choosing ontologies and relations

Modelling challenges (data vs biological reality)

Appropriate Extract/Load/Transform pipelines

Appropriate representation for datatypes

Getting machine readable dataset descriptions

Expertise in the community to effectively produce/consume LD

Services for finding and reusing URIs & ontologies

Data annotation services (mapping data to ontologies)

Provide an API

Link resources to ontology terms

SPARQL fetish

Page 33: The ELIXIR of Linked Data - Open  · PDF fileThe ELIXIR of Linked Data ... (and Elasticsearch) Search, Index and Linked Data. Biological ... Getting data providers to generate LOD

[Mons]

Page 34: The ELIXIR of Linked Data - Open  · PDF fileThe ELIXIR of Linked Data ... (and Elasticsearch) Search, Index and Linked Data. Biological ... Getting data providers to generate LOD

What is ELIXIR?

Page 35: The ELIXIR of Linked Data - Open  · PDF fileThe ELIXIR of Linked Data ... (and Elasticsearch) Search, Index and Linked Data. Biological ... Getting data providers to generate LOD

Human data: The European Genome-phenome Archive EGA