ontologies neo4j-graph-workshop-berlin

Click here to load reader

Post on 16-Mar-2018

126 views

Category:

Science

1 download

Embed Size (px)

TRANSCRIPT

  • Building a repository of biomedical

    ontologies with Neo4j

    Simon Jupp

    Samples, Phenotypes and Ontologies Team

    European Bioinformatics Institute

    Cambridge, UK.

  • The challenge - thousands of data

    attributes

    European Archive for molecular data

    ENA, EVA, EGA, BioSample, ArrayExpress

    How do we make sense of the data?

    SPOT team builds tools to support the mapping of this data to ontologies and other standards

  • Why we need terminology standards (or

    ontologies)

    Dyschromatopsia

  • Search PubMed for color blindness

  • Search PubMed for Dyschromatopsia

  • Search PubMed for "abnormality of the eye"

  • The ontology of color blindness

    HP:0011518 (Dichromacy )HP:0011518 (Eye)

    HP:0000551 (Abnormality of color vision )

    HP:0007641 (Dyschromatopsia)

    Is-a

    Is-aDisease-location

  • Ontology powered applications

    Query expansion in the Gene Expression Atlas searching eye disease finds

    genes expressed in Turner syndrome

    https://www.ebi.ac.uk/gxa/home

    https://www.ebi.ac.uk/gxa/home

  • Ontology powered applications

    Visualising Gene-Disease associations in Open Targets

    https://www.opentargets.org

    https://www.opentargets.org/

  • Ontology powered applications

    SNP trait

    associations in the

    GWAS catalog

    All traits mapped to

    disease, phenotype

    and measurements

    in EFO

    https://www.ebi.ac.uk/gwas/

    Cardiovascular disease traits

    https://www.ebi.ac.uk/gwas/

  • 11

    Genotype Phenotype

    Sequence

    Proteins

    Gene products Transcript

    Pathways

    Cell type

    BRENDA tissue /

    enzyme source

    Development

    Anatomy

    Phenotype

    Plasmodium

    life cycle

    -Sequence types

    and features

    -Genetic Context

    - Molecule role

    - Molecular Function

    - Biological process

    - Cellular component

    -Protein covalent bond

    -Protein domain

    -UniProt taxonomy

    -Pathway ontology

    -Event (INOH pathway

    ontology)

    -Systems Biology

    -Protein-protein

    interaction

    -Arabidopsis development

    -Cereal plant development

    -Plant growth and developmental stage

    -C. elegans development

    -Drosophila development FBdv fly

    development.obo OBO yes yes

    -Human developmental anatomy, abstract

    version

    -Human developmental anatomy, timed version

    -Mosquito gross anatomy

    -Mouse adult gross anatomy

    -Mouse gross anatomy and development

    -C. elegans gross anatomy

    -Arabidopsis gross anatomy

    -Cereal plant gross anatomy

    -Drosophila gross anatomy

    -Dictyostelium discoideum anatomy

    -Fungal gross anatomy FAO

    -Plant structure

    -Maize gross anatomy

    -Medaka fish anatomy and development

    -Zebrafish anatomy and development

    -NCI Thesaurus

    -Mouse pathology

    -Human disease

    -Cereal plant trait

    -PATO PATO attribute and value.obo

    -Mammalian phenotype

    - Human phenotype

    -Habronattus courtship

    -Loggerhead nesting

    -Animal natural history and life history

    eVOC (Expressed

    Sequence Annotation

    for Humans)

    Ontologies for life sciences

  • Ontology Lookup Service

    Ontology search engine

    Ontology term history tracking

    Ontology visualisation

    Powerful RESTful API

    Repository of over 160 pre-selected biomedical ontologies (4.5 million terms, 11

    million relationships)

    http://www.ebi.ac.uk/ols

    Provides unified mechanism to access multiple ontologies

    Large community of users (~5000 p/m, 100s of millions of hits p/m)

    Open source and dockerised

    http://www.ebi.ac.uk/ols

  • Ontology visualisation tools

  • Build process

    Nightly crawl of

    all registered

    ontologies

    Multiple indexes created

    with standalone Spring Boot

    applications

    API and website

    run with Spring data

    https://ebispot.github.io

    Open Source Software

    https://ebispot.github.io

  • Loading ontologies into Neo4j

    Ontologies usually published in W3C OWL format

    RDF based (so already a graph)

    but not a very friendly graph for our

    use-cases (more on this this afternoon)

    Primary OLS use-cases for a graph

    Term hierarchy (parent/child)

    Simple view over other relationships

    Part of, develops from

    Extracting subgraphs/subsets

    e.g. taxon specific subsets

  • OWL to Neo4j schema

    Every term is a node with an label for each ontology

    Each relationship and subset relation is labeled (is-a, part-of, develops-from etc..)

  • Powerful yet simple queries

    Get the transitive closure for heart following parent and partonomy relations from the UBERON anatomy ontology

    MATCH path = (n:Class)-[r:SUBCLASSOF|RelatedTree*]

    ->(parent)

  • Ontology Mappings

    We now have too many ontologies!! with overlapping scope

    Millions of mappings exists to interlink the ontologies

    Datasource 1 Datasource 2

    Human

    Phenotype

    Ontology

    SNOMED-CTMappings

    Xref

  • Ontology Mapping Service (OxO)

    New database of mappings built with Neo4j

    Crawls OLS ontologies and UMLS for mappings and provides UI and API to access all known mappings

    * Went live March 2017

    http://www.ebi.ac.uk/spot/oxo *

    http://www.ebi.ac.uk/spot/oxo

  • Exploring the Xref graph

    We build a graph in Neo4j of known xrefs

    Direct mappings to NCIt Retoinoblastoma from Disease ontology (DO) and EFO

  • Discover new mappings

    If we traverse 1 hop in the graph we can infer more mappings

    1 hop

  • Problems with mappings

    But exposes inconsistencies in public mapping

    Use this as basis for fixing and confirming mappings

  • Conclusion

    Neo4j being adopted in multiple projects across this institute

    Liked as provides simple and effective solution to some of our data modelling challenges

    Neo4j is a good fit for working with ontologies and taxonomic data

    Excellent developer integration for building applications e.g. Spring-data-neo4j

  • Ontology team

    Helen ParkinsonTony Burdett

    Sira SarntivijaiOlga Vrousgou Thomas Liener

    Funding

    EMBL

    CORBEL This project receives funding from the

    European Unions Horizon 2020 research and

    innovation programme under grant agreement No

    654248.

    EXCELERATE ELIXIR-EXCELERATE is funded by

    the European Commission within the Research

    Infrastructures programme of Horizon 2020, grant

    agreement number 676559.

  • Predicting annotation

    We do a lot of data curation with ontologies

    Need better support for mapping prediction

    E.g. Sample likes these are usually annotated with these

    terms

    Need species specificity e.g. only mapping plant samples

    with plant ontology terms

    Input from submission Ontology class

    2-deoxy-5-azacytidine 5-aza-2-deoxycytidine

    Ovarian Cancer ovarian carcinoma

    Anterior tibialis tibialis anterios

    Endothelium, Vascula cardiovascular system endothelium

  • Tagging with ontologies

    We have built a large corpus of known mappings between data values and ontology terms

    Piloting building a recommendation engine for our curation tools with Neo4j