pistoia alliance debates: ontologies mapping webinar 23rd feb 2017
TRANSCRIPT
Ontologies Mapping for more effective data integration and
knowledge management
A Pistoia Alliance Debates Webinar23rd February 2017
Chaired by Ian Harrow
This webinar is being recorded
Poll Question 1:
What is your level of familiarity/involvement with Ontologies?
A. I lead Ontologies work in my organizationB. I contribute to Ontologies work in my
organizationC. I have a basic understanding of OntologiesD. I know very little about Ontologies
Pist
oia
Allia
nce
Chair and Expert Panel
Ontologies Mapping webinarFebruary 23, 2017
Yasmin Alam-Faruque, Scientific Data Specialist at Eagle Genomics Organisation, harmonisation and integration of datasets for the eaglecore knowledge
management platform Previously, biocurator at EMBL-EBI for the renal gene ontology annotation initiative
Simon Jupp, Ontology Project Lead at EMBL-EBI Developed a range of ontologies and ontology services including the Experimental Factor
Ontology and the Ontology Lookup Service Working with ontologies in the life sciences since 2003
Martin Romacker, Principal Scientist at Roche Innovation Center, Basel Data and Information Architect in Pharma Research and Early Development Informatics Focusing on the Knowledge Engineering (Terminologies/ Ontologies) and Scientific Data
Integration using Semantic Technologies
Ian Harrow, Project Manager at Pistoia Alliance (Chair) Consultant services in Bioinformatics and Text Mining Project Manager for the Ontologies Mapping project Previously, Senior Principal Scientist in Bioinformatics at Pfizer
Lee Harland, Founder and COO at SciBite SciBite is a growing company based in Cambridge UK specialising in Text Analytics and
Knowledge Management for life sciences Previously, CTO of the Open PHACTS project and head of the information engineering
group at Pfizer
4
Pist
oia
Allia
nce
5
Agenda
5February 23, 2017 Ontologies Mapping webinar
Panelist Question
Ian Harrow Welcome
Martin Romacker Why are ontologies important for Roche?
What has been achieved by the Ontologies Mapping project?
Yasmin Alam-Faruque What is the value of ontologies to Eagle Genomics?
How has being part of the OM project team helped?
Lee Harland How do ontologies power the SciBite platform?
Simon Jupp What ontology services are available at EMBL-EBI?
Ian Harrow What is the Ontologies Mapping project planning to do next?
Audience Q & A
Why are ontologies important for Roche?
What has been achieved by the project and it’s value to Roche?
Martin Romacker at Roche Innovation Center
Pist
oia
Allia
nce
Changing Perception of Corporate Data Assets
• Pharma Industry is behind other industries(eg Finance, Insurance, Automotive, Wholesale, Retailer – CDO/ CAO)
• Paradigm shift – from lab to data/knowledge?Data is business and business is data – acquisition of data not compounds (Google: data, algorithms, computer)
• Change only happens where the Pharma Industry is forced – why? CDISC, IDMP (heavily relying on ontologies and data standards)
• Pharma Industry accepts an incredible variety of data as input into knowledge-driven business processes (eg CROs, vendor data, cost avoidance)
• Pharma Industry spends huge budgets to generate knowledge - budgets are tight for integration, maintenance and quality assurance
February 23, 2017 Ontologies Mapping webinar 7
Pist
oia
Allia
nce
pREDi Terminology Service (RTS)
• RTS as domain master for terminology management– streamlining terminology management ensuring high data quality– semantic alignment between knowledge repositories lowering barriers
• Faster response to scientific queries (saving time)Less effort for data integration (cost avoidance)
• Support of external collaborations based on data standards (trend CROs)
• Support of well-founded decisions– business or scientific
• Semantic Engineering to define of research/business objects
• USP: Comprehensive semantic model to represent highly-scalable, universal and multi-purpose terminologies
February 23, 2017 Ontologies Mapping webinar 8
Pist
oia
Allia
nce
pREDi Terminology Service (2015-2016)
MasterTerminos
21 (18)
Applications33 (23)
AppTerminos300 (220)
Concepts110k (95k)
Synonyms300k (265k)
3 3 2 825
pRED21
Pharma (PD/PT)
3
GPS6
Partnering1
Diagnostics2
February 23, 2017 Ontologies Mapping webinar 9
Pist
oia
Allia
nce
Ontologies & Data Standards:Value Proposition
Source: https://www.crowdflower.com/the-data-behind-todays-data-scientists-an-infographic/https://whatsthebigdata.com/2016/05/01/data-scientists-spend-most-of-their-time-cleaning-data/
• Data-Science-Readiness (Time-to-Value)• Reduced Effort for Data Integration• Improved Data Quality (completeness, correctness, coherence)
Scientific Data Integration
February 23, 2017 Ontologies Mapping webinar 10
Pist
oia
Allia
nce
Big Data - Semantics as a Key Enabler
Velocity
Volume Veracity
Variety
Value
February 23, 2017 Ontologies Mapping webinar 11
Pist
oia
Allia
nce
Prime Time for Ontologies
• Executives consider data more and more as a corporate asset• Integration of Real World Data and Health Care Data • Translational and Reverse Translational Data Integration• Collaboration with Contract Research Organisation• But: Missing or competing standards in Research & Development
(eg MeSH, SNOMED, MedDRA, NCIt) Legacy systems using own terminologies/ontologies(terminologies/ontologies are ubiquitous but not managed as such)
• Urgent need for Ontologies Mapping
Roche funding of Pistoia Ontologies Mapping Project Phase 1 & 2Roche are committed to funding the proposed Phase 3
February 23, 2017 Ontologies Mapping webinar 12
Pist
oia
Allia
nce
Project Phase 1&2: Timeline and Achievements
13
1) Ontologies domain selected
as “test case”
2Q 3Q2015
4Q
4) Evaluate & select existingOntologies Mapping tool(s) 5)
Requirements for an Ontologies Mapping service
6) Understand the demand for anOntologies Mapping service
1Q 2Q2016
4Q3Q
2) Guidelines for minimal standards & best practices
3) Requirements for Ontologies Mapping tool
Funded by GSK, Merck & Co, Novartis, Roche and BIOVIA 3DS
February 23, 2017 Ontologies Mapping webinar
Pist
oia
Allia
nce
Further Achievements
14
• Conformity with the FAIR principles– Findable and Accessible (public wiki)– Interoperable and Re-usable (aligned to OBO etc.)
• Endorsed by external groups:– Interoperable Services at ELIXIR, Molecular Archival Resources at
EMBL-EBI, Ontologies Mapping Project Community of Interest
• Promotion at conferences/workshops:-– EMBL-EBI March 2016, ISMB July 2016, ECCB September 2016,
Industry Semantic Forum at Roche September 2016, OM October 2016 and ISWC October 2016
• Ontology Alignment Evaluation Initiative– Sponsoring of a competition for the best ontologies matching
algorithm (International Workshop on Ontology Matching)
February 23, 2017 Ontologies Mapping webinar
Pist
oia
Allia
nce
OM Project Community
15
Funders• BIOVIA 3DS• GSK• Merck & Co• Novartis• Roche
Pistoia Operations• Richard Holland• John Wise• Carmen Nitsche• Nick Lynch
Project team• Ian Harrow (Pistoia Project Manager)• Martin Romacker (Roche)• Andrea Splendiani (Novartis)• Stefan Negru (Merck & Co)• Peter Woollard (GSK)• Scott Markel (BIOVIA)• Martin Koch (Osthus)• Heiner Oberkampf (Osthus)• Yasmin Alam-Faruque (Eagle Genomics)• Erfan Younesi (Bayer)• Jabe Wilson (Elsevier)• James Malone (FactBio)
Community of Interest (>80 members)February 23, 2017 Ontologies Mapping webinar
Pist
oia
Allia
nce
Ontologies Mapping Project:Value to Roche
• Phase 1 & 2 Achievements are highly relevant:- Guidelines for selection of reference standards Analysis of available tools for ontologies mapping
(RFI: baseline, checklist for tools) Evaluation of ontologies mapping algorithms
(linkage to OM algorithm community)
• Phase 3 Ontologies Mapping Service proposal (later by Ian) Important for mapping requests (e.g. HPO to MeSH) Important application to semantic alignment Shared resources across scientific community
February 23, 2017 Ontologies Mapping webinar 16
Pist
oia
Allia
nce
Conclusion
• Tremendous change: data are considered as an asset• Urgent need for lowering the barriers for data integration
and data sharing• Demystify knowledge acquisition process
Define as knowledge procurement process• Terminologies/ Ontologies and related Data Standards start
to play a key role but: getting them into business still requires tenacity but: tackle the issue from the value perspective
• Ontologies mapping is a core capability to work efficiently and successfully with corporate data assets This is why Roche consumes and funds the OM project
February 23, 2017 Ontologies Mapping webinar 17
What is the value of ontologies to Eagle Genomics?
How has being part of the project team helped?
Yasmin Alam-Faruque at Eagle Genomics
Pist
oia
Allia
nce
Supporting the bridge between Data and InsightEagle Genomics provides software solutions bridging the gap between “big data” and “innovative biological insight”
Ontologies are essential - allow disparate data to be harmonised, federated and integrated for various high performance computational analyses (i.e. data processing, statistical analyses and data mining)
-> novel insights 19Ontologies Mapping webinarFebruary 23, 2017
Pist
oia
Allia
nce
20
Data curation• We also play an active role in
curating, organising and federating a variety of customer multi-omics datasets and associated metadata into a knowledge management platform (eaglecore).
• Curation of scientific data involves it’s collection, characterisation, cleaning, contextualization, categorisation and cataloguing, making it more visible and available for searching, sharing and further analyses.
• Hence, using ontologies during curation to semantically enrich and harmonise the datasets becomes essential for data integration and interoperability.
Ontologies Mapping webinarFebruary 23, 2017
Pist
oia
Allia
nce
Data valuation
21
• Eagle Genomics pioneers measurement of data value (i.e. usefulness and relevance) in the context of specific scientific questions.
• Value modeling requires data harmonisation using ontologies. • We can measure the value of data before the use of
ontologies and after, according to quality metrics and value metrics.
Dataset Catalogue Dataset Catalogue (improved quality)
Quality metricsDescriptive statistics
Dataset Catalogue (improved value)
Improvement Improvement
Value metricsAHP, QFD
Ontologies
Ontologies Mapping webinarFebruary 23, 2017
Pist
oia
Allia
nce
Data Governance
22
Governance
Validity
Consistency
ProcessesOrganisations
..standardsguidelines
Are we doing the right things?
Are we doing the things right?
ArchitectureData &
contextual models
…semantics
Goals
Governance by design
Measurement
• Emerging as an important activity for biopharma and healthcare industries
• Complex initiative: relates to the validity and consistency throughout the organisation
• Ensuring everyone refers to the same drug or disease across all organisational departments/sites (R&D -> clinical trials -> sale of drug to treat disease) is essential.
• can be initiated by use of ontologies/ controlled vocabularies to tag and link experiments/ datasets
Ontologies Mapping webinarFebruary 23, 2017
Pist
oia
Allia
nce
How has being part of the project team helped?
23
Allowed visibility - played an active role throughout the project which has projected a serious and professional image among other organisational team members
Provided an overall increase in our expertise, understanding and capability within this important field
Credibility with potential customers/clients as we are heavily involved in this important community project along with other Pistoia member organisations
Opportunity to become aware of the evaluation and selection of the best potential academic/ commercial Ontology mapping tool/ service provider for future customer projects, ahead of the project starting – saving time.
Opportunity to be involved in the development of various documentation:• detailing the functional and non-functional requirements for an
Ontologies Mapping Tool• Ontology mapping guidelines (already comprehensively followed by
some ontologies)
Ontologies Mapping webinarFebruary 23, 2017
Poll Question 2:
Where do you source mappings between ontologies?
A. Mostly external sources of mappingsB. Mostly internal curation of mappingsC. A mixture of both external and internal
sourcesD. I do not know
How do ontologies power the SciBite platform?
Lee Harland at SciBite
Pist
oia
Allia
nce
Ontologies In The SciBite Platform
26
Lee Harland | @SciBitely | www.scibite.com
February 23, 2017 Ontologies Mapping webinar
Pist
oia
Allia
nce
27
80-90% of all potentially usable business information may originate in unstructured form
https://en.wikipedia.org/wiki/Unstructured_data
February 23, 2017 Ontologies Mapping webinar
Pist
oia
Allia
nce
‘Semantics-as-a-Service’
28
Text ContentDocuments & Databases
Ontologies:Gene/Disease/Drug; Molecular; Chemical; Clinical; Adverse Event; Pharm Sci & Manufacturing; Business & Commercial; Regulatory; Geo-location; University/Company
Structured Data
+SciBite
API
February 23, 2017 Ontologies Mapping webinar
Pist
oia
Allia
nce
Public Ontologies Are Vital
29
What They Are Great For• Providing a open, consistent, stable
identifier for a given “thing”• Developing community consensus as to
what that ”thing” is• Developing community consensus on what
all the things are • Powering Data Integration• Powering Scientific Analytics
Not Designed For Text Analytics/Mining
February 23, 2017 Ontologies Mapping webinar
Pist
oia
Allia
nce
3 Key Issues
30
e.g. Human Phenotype Ontology (HPO) is a gold reference standard for phenotypes and many use cases start with “find all the phenotypes….”
But 6997 synonyms in current HPO over 11375 entities. Similar for many others as not their raison d'etre
1. Synonym Coverage
2. Coding Style 3. Ambiguity
February 23, 2017 Ontologies Mapping webinar
Pist
oia
Allia
nce
Ontology Engineering
31
Raw Ontologies
Public SciBite
Customer
Expert Curation
Training, Testing, & Validation
TERMite
Feedback To Producers
Automated Learning, Enrichment & Curation
February 23, 2017 Ontologies Mapping webinar
Pist
oia
Allia
nce
SciBite Ontology Enrichment*
SciBite
HPO
MeSH
HGNC
Original
* Actual search space many fold larger due to adaptive matching
February 23, 2017 Ontologies Mapping webinar 32
Pist
oia
Allia
nce
Ontology-Driven Search
33February 23, 2017 Ontologies Mapping webinar
Pist
oia
Allia
nce
Ontology-Driven Search Everywhere!
34
SciBite
Researchers (End User)
Information Management
(Content Awareness)
Informatics(Text Mining)
System Developers (3rd Party Integration)
Pharma’s Internal Apps
Commercial Content/Software
Providers
Competitor IntelligenceLiterature Awareness, Trends & AlertingAlign 3rd Party Content
Disease & Phenotype Networks, PPIs, Pharmacovigilance, Drug-Target/Biomarker MiningPatent MiningMachine Learning/AIDocument Management, Enterprise & Local Search,Ontology-XREFSemantic Auto-completeRDF & Integration
Customer-Provider IntegrationOntology-driven applicationsSmarter Apps!
February 23, 2017 Ontologies Mapping webinar
Pist
oia
Allia
nce
Summary
35
• Text (Databases & Documents) accounts for large amount of corporate “knowledge”
• Public & Internal Ontologies have great potential in structuring this text into minable data
• But these ontologies require significant processing, both human and automated in order to make them “fit for purpose”
• Combine this with a fast, flexible, simple API and you can address a vast array of different use cases in– Software Vendors & Systems Developers– Content Providers– Data Scientists & Text Miners
February 23, 2017 Ontologies Mapping webinar
What ontology services are available at EMBL-EBI?
Simon Jupp at EMBL-EBI
Pist
oia
Allia
nce
Ontologies at EMBL-EBI
Applications
Disease BioAssays
Cell lines
Cell types
Small moleculesEvidence Taxonomy
Drugs
Adverse events
Information Gene function Plant anatomy
Mouse anatomy Phenotype
EVA Expression Atlas
GWAS catalog
Array Express
Biomedical ontologies
February 23, 2017 Ontologies Mapping webinar 37
Pist
oia
Allia
nce
The challenge - thousands of data attributes…
• Use the data to focus our curation efforts– For experimental data we focus on species, cell types, tissue types, disease state,
phenotypes– Identify gaps in public ontologies
• Different requirements– High quality, manually curated resources e.g. GWAS catalog, OpenTargets – High throughput, automated curation e.g. archival resources like BioSamples
February 23, 2017 Ontologies Mapping webinar 38
Pist
oia
Allia
nce
We build ontology aware applicationsSmarter searching Data analysis
Data integration
Data visualisation
February 23, 2017 Ontologies Mapping webinar 39
Pist
oia
Allia
nce
Common questions
• How can I access ontologies?• How do I map data to ontologies? • What about data that doesn’t map?• How can I translate from one ontology to
another?• How can I extend an ontology?• How do I build “ontology aware”
applications? • How should I publish my data?
February 23, 2017 Ontologies Mapping webinar 40
Pist
oia
Allia
nce
We are building a Ontology Toolkit
Search/Visualise ontologies
Annotate data
Ontology mappings
Create new ontology content
Webulous
Ontology Lookup Service
OxO
Zooma
February 23, 2017 Ontologies Mapping webinar 41
Pist
oia
Allia
nce
Ontology Lookup Service
• Ontology search engine • Ontology term history tracking• Ontology visualisation • Powerful RESTful API
Repository of over 160 pre-selected biomedical ontologies (4.5 million terms)
http://www.ebi.ac.uk/ols
• Provides unified mechanism to access multiple ontologies
• Large community of users, 10s of millions of hits per month
February 23, 2017 Ontologies Mapping webinar 42
Pist
oia
Allia
nce
Zooma
• Optimal mappings based on data we have seen previously• Favours precision over recall for use in automated pipelines• Currently contains over 92,000 curated annotations from 7 resources
– ClinVar, Cellular Phenotype Database, ExpressionAtlas, UniProt, GWAS, EBiSC, OpenTargets
– Used to improve and share their mappings across resources
Repository of curated ontology mappingshttp://www.ebi.ac.uk/spot/zooma
February 23, 2017 Ontologies Mapping webinar 43
Pist
oia
Allia
nce
New for 2017 – Ontology Xrefs• A lot curator effort in building ontology
cross-references• Cross-references are a powerful tool for
integrating data
Data source 1 Data source 2
Human Phenotype Ontology
SNOMED-CTMappings
February 23, 2017 Ontologies Mapping webinar 44
Pist
oia
Allia
nce
Ontology Mapping Service (OxO)• New curation platform for community built mappings • Seeded with mappings from OLS and other sources (UMLS,
SNOMED)• Normalised CURIE prefixes using identifiers.org
– SNOMED-CT: / SNOMEDCT: / SNOMED: / SNOMEDCT_• Provides a gold standard to support predictive mapping
algorithms http://www.ebi.ac.uk/spot/oxo * * Going live March 2017
February 23, 2017 Ontologies Mapping webinar 45
Pist
oia
Allia
nce
Webulous – creating new ontology content • Spreadsheet templates for adding new
ontology content– Ontology “aware” for in sheet validation– Generic ontology building technology
• Works with Google sheets
Webulous server
Exposes list of ontology design templates
Populated templates convertedOWL
Webulous exports newly generated ontology
http://www.ebi.ac.uk/efo/webulous
February 23, 2017 Ontologies Mapping webinar 46
Pist
oia
Allia
nce
Putting it all together
• How can I access ontologies?• How do I map data to ontologies? • What about data that doesn’t map?• How can I translate from one ontology to
another?• How can I extend an ontology?• How do I build “ontology aware”
applications? • How do I publish my annotations?
February 23, 2017 Ontologies Mapping webinar 47
Pist
oia
Allia
nce
Ontology team
Helen ParkinsonTony Burdett
Sira SarntivijaiOlga Vrousgou Thomas Liener
February 23, 2017 Ontologies Mapping webinar 48
What is the project planning to do next?
Ian Harrow at Pistoia Alliance
Pist
oia
Allia
nce
Why do we need Ontology Mapping?
50
Data domain Example: Disease and Phenotype
Ontology 1 Ontology 2 Ontology 3
Mapping 1-2 Mapping 2-3
Mapping Tools and Services Higher scalability at reduced cost of maintenance
A better engineering solution for application ontologies
Expandable coverage
+ More…
February 23, 2017 Ontologies Mapping webinar
Pist
oia
Allia
nce
Proposal for an Ontology Mapping Service
OLSOntology Lookup
Service
OXOOntology Cross
References (Mappings)
ZOOMAMapping Tool
Database of curated mappings sourced
from public datasets
162 Ontologies
Mapping free text annotations to ontology
terms based on a curated repository of annotation
knowledge
Pistoia Alliance Prototype Ontologies Mapping Service (OMS):-Develop an OMS to build on the existing Ontology Services at EMBL-EBI
Evaluate value and quality of selected mappings in Disease & Phenotype domain
February 23, 2017 Ontologies Mapping webinar 51
Pist
oia
Allia
nce
Proposed Deliverables and Timeline
1) Start the prototype service to run for 6 months
4Q
3) Complete prototype service and report performance metrics
Requirements for an Ontologies Mapping service as a standard
2QPhase 3: 2017
3Q
2) 3 month review of service performance metrics
Phase 2: 2016
Promote and publicise prototype service
1QPh3: Preparatory
February 23, 2017 52Ontologies Mapping webinar
Pist
oia
Allia
nce
Benefits and Support for Phase 3
53
Expected Benefits• Evaluate value and quality of mappings between public
disease & phenotype ontologies selected by funders• Evaluate value and quality of public to internal mappings
selected by funders• Build on the database for public ontology mappings• Extendible to any ontology hosted at EMBL-EBI
Call for Support• Roche have committed funds and interest is growing• Please contact [email protected] about
support for the projectNow is a great time to join us!!!
February 23, 2017 Ontologies Mapping webinar
Audience Q&APlease use the Question function in GoToWebinar
Join the Deep LearningHackathon March 25-26 London
Help us show how Deep Learning can impact Life Sciences and Healthcare
Why attend?• Create something that could make a life changing difference to human health.• Job opportunities - meet and find out more about working with the pharma /
healthcare industry. We will bring the companies to you. Your team mate at the event could be your future colleague!
• Win prizes and gain recognition - you can receive your prize at our conference in front of over 100 senior industry experts from R & D and IT.
• We hope you will make new connections from new areas and disciplines, perhaps even see new career directions you never thought possible.
• It’s fun!!!
http://www.pistoiaalliance.org/eventdetails/pistoia-alliance-hackathon
[email protected] @pistoiaalliance www.pistoiaalliance.org
Thank you all, you have been a great audience!