Download - Ontologies neo4j-graph-workshop-berlin
Building a repository of biomedical
ontologies with Neo4j
Simon Jupp
Samples, Phenotypes and Ontologies Team
European Bioinformatics Institute
Cambridge, UK.
The challenge - thousands of data
attributes…
• European Archive for molecular data
• ENA, EVA, EGA, BioSample, ArrayExpress
• How do we make sense of the data?
• SPOT team builds tools to support the mapping of this data to ontologies
and other standards
The ontology of color blindness
HP:0011518 (Dichromacy )HP:0011518 (Eye)
HP:0000551 (Abnormality of color vision )
HP:0007641 (Dyschromatopsia)
Is-a
Is-aDisease-location
Ontology powered applications
Query expansion in the Gene Expression Atlas – searching “eye disease” finds
genes expressed in “Turner syndrome”
https://www.ebi.ac.uk/gxa/home
Ontology powered applications
Visualising Gene-Disease associations in Open Targets
https://www.opentargets.org
Ontology powered applications
SNP – trait
associations in the
GWAS catalog
All traits mapped to
disease, phenotype
and measurements
in EFO
https://www.ebi.ac.uk/gwas/
Cardiovascular disease traits
11
Genotype Phenotype
Sequence
Proteins
Gene products Transcript
Pathways
Cell type
BRENDA tissue /
enzyme source
Development
Anatomy
Phenotype
Plasmodium
life cycle
-Sequence types
and features
-Genetic Context
- Molecule role
- Molecular Function
- Biological process
- Cellular component
-Protein covalent bond
-Protein domain
-UniProt taxonomy
-Pathway ontology
-Event (INOH pathway
ontology)
-Systems Biology
-Protein-protein
interaction
-Arabidopsis development
-Cereal plant development
-Plant growth and developmental stage
-C. elegans development
-Drosophila development FBdv fly
development.obo OBO yes yes
-Human developmental anatomy, abstract
version
-Human developmental anatomy, timed version
-Mosquito gross anatomy
-Mouse adult gross anatomy
-Mouse gross anatomy and development
-C. elegans gross anatomy
-Arabidopsis gross anatomy
-Cereal plant gross anatomy
-Drosophila gross anatomy
-Dictyostelium discoideum anatomy
-Fungal gross anatomy FAO
-Plant structure
-Maize gross anatomy
-Medaka fish anatomy and development
-Zebrafish anatomy and development
-NCI Thesaurus
-Mouse pathology
-Human disease
-Cereal plant trait
-PATO PATO attribute and value.obo
-Mammalian phenotype
- Human phenotype
-Habronattus courtship
-Loggerhead nesting
-Animal natural history and life history
eVOC (Expressed
Sequence Annotation
for Humans)
Ontologies for life sciences
Ontology Lookup Service
• Ontology search engine
• Ontology term history tracking
• Ontology visualisation
• Powerful RESTful API
Repository of over 160 pre-selected biomedical ontologies (4.5 million terms, 11
million relationships)
http://www.ebi.ac.uk/ols
• Provides unified mechanism to access
multiple ontologies
• Large community of users (~5000 p/m, 100s
of millions of hits p/m)
• Open source and dockerised
Build process
Nightly crawl of
all registered
ontologies
Multiple indexes created
with standalone Spring Boot
applications
API and website
run with Spring data
https://ebispot.github.io
Open Source Software
Loading ontologies into Neo4j
• Ontologies usually published in W3C
OWL format
• RDF based (so already a graph)
• …but not a very friendly graph for our
use-cases (more on this this afternoon)
• Primary OLS use-cases for a graph
• Term hierarchy (parent/child)
• Simple view over other relationships
• Part of, develops from
• Extracting subgraphs/subsets
• e.g. taxon specific subsets
OWL to Neo4j schema
Every term is a node with an label for each ontology
Each relationship and subset relation is labeled (is-a, part-of, develops-from etc..)
Powerful yet simple queries
• Get the transitive closure for “heart” following parent and
partonomy relations from the UBERON anatomy ontology
MATCH path = (n:Class)-[r:SUBCLASSOF|RelatedTree*]
->(parent)<-[r2:SUBCLASSOF|RelatedTree]-(sibling:Class)
WHERE n.ontology_name = {0} AND n.iri = {1}
Ontology Mappings
• We now have too many ontologies!! with overlapping scope
• Millions of mappings exists to interlink the ontologies
Datasource 1 Datasource 2
Human
Phenotype
Ontology
SNOMED-CTMappings
Xref
Ontology Mapping Service (OxO)
• New database of mappings built with Neo4j
• Crawls OLS ontologies and UMLS for mappings and provides UI and
API to access all known mappings
* Went live March 2017
http://www.ebi.ac.uk/spot/oxo *
Exploring the Xref graph
• We build a graph in Neo4j of known xrefs
• Direct mappings to NCIt “Retoinoblastoma” from Disease
ontology (DO) and EFO
Problems with mappings
• But exposes inconsistencies in public mapping
• Use this as basis for fixing and confirming mappings
Conclusion
• Neo4j being adopted in multiple projects across this
institute
• Liked as provides simple and effective solution to some of
our data modelling challenges
• Neo4j is a good fit for working with ontologies and
taxonomic data
• Excellent developer integration for building applications
e.g. Spring-data-neo4j
Ontology team
Helen ParkinsonTony Burdett
Sira SarntivijaiOlga Vrousgou Thomas Liener
Funding
• EMBL
• CORBEL This project receives funding from the
European Union’s Horizon 2020 research and
innovation programme under grant agreement No
654248.
• EXCELERATE ELIXIR-EXCELERATE is funded by
the European Commission within the Research
Infrastructures programme of Horizon 2020, grant
agreement number 676559.
Predicting annotation
• We do a lot of data curation with ontologies
• Need better support for mapping prediction
• E.g. Sample likes these are usually annotated with these
terms
• Need species specificity e.g. only mapping plant samples
with plant ontology terms
Input from submission Ontology class
2’-deoxy-5-azacytidine 5-aza-2’-deoxycytidine
Ovarian Cancer ovarian carcinoma
Anterior tibialis tibialis anterios
Endothelium, Vascula cardiovascular system endothelium