Transcript

Building a repository of biomedical

ontologies with Neo4j

Simon Jupp

Samples, Phenotypes and Ontologies Team

European Bioinformatics Institute

Cambridge, UK.

The challenge - thousands of data

attributes…

• European Archive for molecular data

• ENA, EVA, EGA, BioSample, ArrayExpress

• How do we make sense of the data?

• SPOT team builds tools to support the mapping of this data to ontologies

and other standards

Why we need terminology standards (or

ontologies)

Dyschromatopsia

Search PubMed for “color blindness”

Search PubMed for “Dyschromatopsia”

Search PubMed for "abnormality of the eye"

The ontology of color blindness

HP:0011518 (Dichromacy )HP:0011518 (Eye)

HP:0000551 (Abnormality of color vision )

HP:0007641 (Dyschromatopsia)

Is-a

Is-aDisease-location

Ontology powered applications

Query expansion in the Gene Expression Atlas – searching “eye disease” finds

genes expressed in “Turner syndrome”

https://www.ebi.ac.uk/gxa/home

Ontology powered applications

Visualising Gene-Disease associations in Open Targets

https://www.opentargets.org

Ontology powered applications

SNP – trait

associations in the

GWAS catalog

All traits mapped to

disease, phenotype

and measurements

in EFO

https://www.ebi.ac.uk/gwas/

Cardiovascular disease traits

11

Genotype Phenotype

Sequence

Proteins

Gene products Transcript

Pathways

Cell type

BRENDA tissue /

enzyme source

Development

Anatomy

Phenotype

Plasmodium

life cycle

-Sequence types

and features

-Genetic Context

- Molecule role

- Molecular Function

- Biological process

- Cellular component

-Protein covalent bond

-Protein domain

-UniProt taxonomy

-Pathway ontology

-Event (INOH pathway

ontology)

-Systems Biology

-Protein-protein

interaction

-Arabidopsis development

-Cereal plant development

-Plant growth and developmental stage

-C. elegans development

-Drosophila development FBdv fly

development.obo OBO yes yes

-Human developmental anatomy, abstract

version

-Human developmental anatomy, timed version

-Mosquito gross anatomy

-Mouse adult gross anatomy

-Mouse gross anatomy and development

-C. elegans gross anatomy

-Arabidopsis gross anatomy

-Cereal plant gross anatomy

-Drosophila gross anatomy

-Dictyostelium discoideum anatomy

-Fungal gross anatomy FAO

-Plant structure

-Maize gross anatomy

-Medaka fish anatomy and development

-Zebrafish anatomy and development

-NCI Thesaurus

-Mouse pathology

-Human disease

-Cereal plant trait

-PATO PATO attribute and value.obo

-Mammalian phenotype

- Human phenotype

-Habronattus courtship

-Loggerhead nesting

-Animal natural history and life history

eVOC (Expressed

Sequence Annotation

for Humans)

Ontologies for life sciences

Ontology Lookup Service

• Ontology search engine

• Ontology term history tracking

• Ontology visualisation

• Powerful RESTful API

Repository of over 160 pre-selected biomedical ontologies (4.5 million terms, 11

million relationships)

http://www.ebi.ac.uk/ols

• Provides unified mechanism to access

multiple ontologies

• Large community of users (~5000 p/m, 100s

of millions of hits p/m)

• Open source and dockerised

Ontology visualisation tools

Build process

Nightly crawl of

all registered

ontologies

Multiple indexes created

with standalone Spring Boot

applications

API and website

run with Spring data

https://ebispot.github.io

Open Source Software

Loading ontologies into Neo4j

• Ontologies usually published in W3C

OWL format

• RDF based (so already a graph)

• …but not a very friendly graph for our

use-cases (more on this this afternoon)

• Primary OLS use-cases for a graph

• Term hierarchy (parent/child)

• Simple view over other relationships

• Part of, develops from

• Extracting subgraphs/subsets

• e.g. taxon specific subsets

OWL to Neo4j schema

Every term is a node with an label for each ontology

Each relationship and subset relation is labeled (is-a, part-of, develops-from etc..)

Powerful yet simple queries

• Get the transitive closure for “heart” following parent and

partonomy relations from the UBERON anatomy ontology

MATCH path = (n:Class)-[r:SUBCLASSOF|RelatedTree*]

->(parent)<-[r2:SUBCLASSOF|RelatedTree]-(sibling:Class)

WHERE n.ontology_name = {0} AND n.iri = {1}

Ontology Mappings

• We now have too many ontologies!! with overlapping scope

• Millions of mappings exists to interlink the ontologies

Datasource 1 Datasource 2

Human

Phenotype

Ontology

SNOMED-CTMappings

Xref

Ontology Mapping Service (OxO)

• New database of mappings built with Neo4j

• Crawls OLS ontologies and UMLS for mappings and provides UI and

API to access all known mappings

* Went live March 2017

http://www.ebi.ac.uk/spot/oxo *

Exploring the Xref graph

• We build a graph in Neo4j of known xrefs

• Direct mappings to NCIt “Retoinoblastoma” from Disease

ontology (DO) and EFO

Discover new mappings

• If we traverse 1 hop in the graph we can infer more

mappings

1 hop

Problems with mappings

• But exposes inconsistencies in public mapping

• Use this as basis for fixing and confirming mappings

Conclusion

• Neo4j being adopted in multiple projects across this

institute

• Liked as provides simple and effective solution to some of

our data modelling challenges

• Neo4j is a good fit for working with ontologies and

taxonomic data

• Excellent developer integration for building applications

e.g. Spring-data-neo4j

Ontology team

Helen ParkinsonTony Burdett

Sira SarntivijaiOlga Vrousgou Thomas Liener

Funding

• EMBL

• CORBEL This project receives funding from the

European Union’s Horizon 2020 research and

innovation programme under grant agreement No

654248.

• EXCELERATE ELIXIR-EXCELERATE is funded by

the European Commission within the Research

Infrastructures programme of Horizon 2020, grant

agreement number 676559.

Predicting annotation

• We do a lot of data curation with ontologies

• Need better support for mapping prediction

• E.g. Sample likes these are usually annotated with these

terms

• Need species specificity e.g. only mapping plant samples

with plant ontology terms

Input from submission Ontology class

2’-deoxy-5-azacytidine 5-aza-2’-deoxycytidine

Ovarian Cancer ovarian carcinoma

Anterior tibialis tibialis anterios

Endothelium, Vascula cardiovascular system endothelium

Tagging with ontologies

• We have built a large corpus of known mappings

between “data values” and ontology terms

• Piloting building a recommendation engine for our

curation tools with Neo4j


Top Related